html { }
:root { }
html { font-size: 14px; background-color: var(--bg-color); color: var(--text-color); font-family: "Helvetica Neue", Helvetica, Arial, sans-serif }
body { margin: 0px; padding: 0px; height: auto; bottom: 0px; top: 0px; left: 0px; right: 0px; font-size: 1rem; line-height: 1.42857143; background-image: inherit; background-attachment: inherit; background-color: inherit; background-position: inherit inherit; background-repeat: inherit inherit }
a:active,a:hover { outline: 0px }
.in-text-selection,::selection { background-color: #b5d6fc; background-position: initial initial; background-repeat: initial initial }
#write { margin: 0px auto; height: auto; width: inherit; position: relative; padding-bottom: 70px; white-space: pre-wrap }
.for-image #write { padding-left: 8px; padding-right: 8px }
body.typora-export { padding-left: 30px; padding-right: 30px }
.typora-export #write { margin: 0px auto }
#write>p:first-child,#write>ul:first-child,#write>ol:first-child,#write>pre:first-child,#write>blockquote:first-child,#write>div:first-child,#write>table:first-child { margin-top: 30px }
#write li>table:first-child { margin-top: -20px }
img { max-width: 100%; vertical-align: middle }
input,button,select,textarea { color: inherit; font-family: inherit; font-size: inherit; font-style: inherit; font-weight: inherit; line-height: inherit }
input[type="checkbox"],input[type="radio"] { line-height: normal; padding: 0px }
::before,::after,* { }
#write p,#write h1,#write h2,#write h3,#write h4,#write h5,#write h6,#write div,#write pre { width: inherit }
#write p,#write h1,#write h2,#write h3,#write h4,#write h5,#write h6 { position: relative }
h1 { font-size: 2rem }
h2 { font-size: 1.8rem }
h3 { font-size: 1.6rem }
h4 { font-size: 1.4rem }
h5 { font-size: 1.2rem }
h6 { font-size: 1rem }
p { }
.typora-export p { white-space: normal }
.mathjax-block { margin-top: 0px; margin-bottom: 0px }
.hidden { display: none }
.md-blockmeta { color: #cccccc; font-weight: bold; font-style: italic }
a { cursor: pointer }
sup.md-footnote { padding: 2px 4px; background-color: rgba(238, 238, 238, 0.701961); color: #555555 }
#write input[type="checkbox"] { cursor: pointer; width: inherit; height: inherit; margin: 4px 0px 0px }
#write>figure:first-child { margin-top: 16px }
figure { margin: -8px 0px 0px -8px; max-width: calc(100% + 16px); padding: 8px }
tr { page-break-inside: avoid; page-break-after: auto }
thead { display: table-header-group }
table { border-collapse: collapse; border-spacing: 0px; width: 100%; overflow: auto; page-break-inside: auto; text-align: left }
table.md-table td { min-width: 80px }
.CodeMirror-gutters { border-right-width: 0px; background-color: inherit }
.CodeMirror { text-align: left }
.CodeMirror-placeholder { opacity: 0.3 }
.CodeMirror pre { padding: 0px 4px }
.CodeMirror-lines { padding: 0px }
div.hr:focus { cursor: none }
pre { white-space: pre-wrap }
.CodeMirror-gutters { margin-right: 4px }
.md-fences { font-size: 0.9rem; display: block; page-break-inside: avoid; text-align: left; overflow: visible; white-space: pre; background: var(--code-block-bg-color); position: relative !important }
.md-diagram-panel { width: 100%; margin-top: 10px; text-align: center; padding-top: 0px; padding-bottom: 8px }
.md-fences .CodeMirror.CodeMirror-wrap { top: -1.6em; margin-bottom: -1.6em }
.md-fences.mock-cm { white-space: pre-wrap }
.show-fences-line-number .md-fences { padding-left: 0px }
.show-fences-line-number .md-fences.mock-cm { padding-left: 40px }
.CodeMirror-line { page-break-inside: avoid }
.footnotes { opacity: 0.8; font-size: 0.9rem; padding-top: 1em; padding-bottom: 1em }
.footnotes+.footnotes { margin-top: -1em }
.md-reset { margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: top; background-color: transparent; text-decoration: none; float: none; position: static; width: auto; height: auto; white-space: nowrap; cursor: inherit; line-height: normal; font-weight: normal; text-align: left; direction: ltr; background-position: initial initial; background-repeat: initial initial }
li div { padding-top: 0px }
blockquote { margin: 1rem 0px }
li p,li .mathjax-block { margin: 0.5rem 0px }
li { margin: 0px; position: relative }
blockquote>:last-child { margin-bottom: 0px }
blockquote>:first-child { margin-top: 0px }
.footnotes-area { color: #888888; margin-top: 0.714rem; padding-bottom: 0.143rem }
.footnote-line { margin-top: 0.714em; font-size: 0.7em }
a img,img a { cursor: pointer }
pre.md-meta-block { font-size: 0.8rem; min-height: 2.86rem; white-space: pre-wrap; background-color: #cccccc; display: block; background-position: initial initial; background-repeat: initial initial }
p>img:only-child { display: block; margin: auto }
p .md-image:only-child { display: inline-block; width: 100%; text-align: center }
#write .MathJax_Display { margin: 0.8em 0px 0px }
.mathjax-block { white-space: pre; overflow: hidden; width: 100% }
p+.mathjax-block { margin-top: -1.143rem }
.mathjax-block:not(:empty)::after { display: none }
[contenteditable="true"]:active,[contenteditable="true"]:focus { outline: none }
.task-list { list-style-type: none }
.task-list-item { position: relative; padding-left: 1em }
.task-list-item input { position: absolute; top: 0px; left: 0px }
.math { font-size: 1rem }
.md-toc { min-height: 3.58rem; position: relative; font-size: 0.9rem }
.md-toc-content { position: relative; margin-left: 0px }
.md-toc::after,.md-toc-content::after { display: none }
.md-toc-item { display: block; color: #4183c4 }
.md-toc-item a { text-decoration: none }
.md-toc-inner:hover { }
.md-toc-inner { display: inline-block; cursor: pointer }
.md-toc-h1 .md-toc-inner { margin-left: 0px; font-weight: bold }
.md-toc-h2 .md-toc-inner { margin-left: 2em }
.md-toc-h3 .md-toc-inner { margin-left: 4em }
.md-toc-h4 .md-toc-inner { margin-left: 6em }
.md-toc-h5 .md-toc-inner { margin-left: 8em }
.md-toc-h6 .md-toc-inner { margin-left: 10em }
a.md-toc-inner { font-size: inherit; font-style: inherit; font-weight: inherit; line-height: inherit }
.footnote-line a:not(.reversefootnote) { color: inherit }
.md-attr { display: none }
.md-fn-count::after { content: "." }
.md-tag { opacity: 0.5 }
.md-comment { color: #a27f03; opacity: 0.8; font-family: monospace }
code { text-align: left }
h1 .md-tag,h2 .md-tag,h3 .md-tag,h4 .md-tag,h5 .md-tag,h6 .md-tag { font-weight: initial; opacity: 0.35 }
a.md-print-anchor { border: none !important; display: inline-block !important; position: absolute !important; width: 1px !important; right: 0px !important; outline: none !important; background-color: transparent !important; background-position: initial initial !important; background-repeat: initial initial !important }
.md-inline-math .MathJax_SVG .noError { display: none !important }
.mathjax-block .MathJax_SVG_Display { text-align: center; margin: 1em 0em; position: relative; text-indent: 0px; max-width: none; max-height: none; min-height: 0px; min-width: 100%; width: auto; display: block !important }
.MathJax_SVG_Display,.md-inline-math .MathJax_SVG_Display { width: auto; margin: inherit; display: inline-block !important }
.MathJax_SVG .MJX-monospace { font-family: monospace }
.MathJax_SVG .MJX-sans-serif { font-family: sans-serif }
.MathJax_SVG { display: inline; font-style: normal; font-weight: normal; line-height: normal; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px }
.MathJax_SVG * { }
.md-diagram-panel>svg { max-width: 100% }
[lang="flow"] svg,[lang="mermaid"] svg { max-width: 100% }
:root { }
.mac-seamless-mode #typora-sidebar { top: var(--mac-title-bar-height); padding-top: 0; height: auto }
html,body,#write { background: #fcfcfc; font-family: "TeXGyreAdventor", "Century Gothic", "Yu Gothic", "Raleway", "STHeiti", sans-serif; font-weight: 300 }
h1,h2,h3,h4,h5,h6 { color: #111; font-family: "TeXGyreAdventor", "Century Gothic", "Yu Gothic", "Ubuntu", "STHeiti", sans-serif }
html { font-size: 16px }
#write { max-width: 914px; text-align: justify }
#write>h1:first-child { margin-top: 2.75rem }
#write>h2:first-child { margin-top: 1.75rem }
#write>h3:first-child { margin-top: 1rem }
#write>h4:first-child { margin-top: 0.5rem }
h1 { font-weight: normal; line-height: 4rem; margin: 0 0 1.75rem; padding: 20px 30px; text-align: center; text-transform: uppercase; margin-top: 4rem }
h2 { font-weight: normal; line-height: 3rem; margin: 0 0 1.9375rem; padding: 0 30px; text-align: center; text-transform: uppercase; margin-top: 3rem }
h3 { font-weight: normal }
h4 { font-weight: normal }
h5 { font-size: 1.125rem; font-weight: normal }
h6 { font-size: 1rem; font-weight: bold }
p { color: #111; font-size: 1rem; line-height: 1.75rem; margin: 0 0 1.25rem }
#write>h3.md-focus::before { left: -1.875rem; top: 0.5rem; padding: 2px }
#write>h4.md-focus::before { left: -1.875rem; top: 0.3125rem; padding: 2px }
#write>h5.md-focus::before { left: -1.875rem; top: 0.25rem; padding: 2px }
#write>h6.md-focus::before { left: -1.875rem; top: .125rem; padding: 2px }
a,.md-def-url { color: #990000; text-decoration: none }
a:hover { text-decoration: underline }
table { margin-bottom: 20px }
table th,table td { padding: 8px; line-height: 1.25rem; vertical-align: top; border-top: 1px solid #ddd }
table th { font-weight: bold }
table thead th { vertical-align: bottom }
table caption+thead tr:first-child th,table caption+thead tr:first-child td,table colgroup+thead tr:first-child th,table colgroup+thead tr:first-child td,table thead:first-child tr:first-child th,table thead:first-child tr:first-child td { border-top: 0 }
table tbody+tbody { border-top: 2px solid #ddd }
code,.md-fences { border: 1px solid #ccc; padding: .1em; font-size: 0.9em; margin-left: 0.2em; margin-right: 0.2em }
.md-fences { margin: 0 0 20px; font-size: 1em; padding: 0.3em 1em; padding-top: 0.4em }
.task-list { padding-left: 0 }
.task-list-item { padding-left: 2.125rem }
.task-list-item input { top: 3px }
.task-list-item input::before { content: ""; display: inline-block; width: 1rem; height: 1rem; vertical-align: middle; text-align: center; border: 1px solid gray; background-color: #fdfdfd; margin-left: 0; margin-top: -0.8rem }
.task-list-item input:checked::before,.task-list-item input[checked]::before { content: "?"; font-size: 0.8125rem; line-height: 0.9375rem; margin-top: -1rem }
blockquote { margin: 0 0 1.11111rem; padding: 0.5rem 1.11111rem 0 1.05556rem; border-left: 1px solid gray }
blockquote,blockquote p { line-height: 1.6; color: #6f6f6f }
#write pre.md-meta-block { min-height: 30px; background: #f8f8f8; padding: 1.5em; font-weight: 300; font-size: 1em; padding-bottom: 1.5em; padding-top: 3em; margin-top: -1.5em; color: #999; width: 100vw; max-width: calc(100% + 60px); margin-left: -30px; border-left: 30px #f8f8f8 solid; border-right: 30px #f8f8f8 solid }
.MathJax_Display { font-size: 0.9em; margin-top: 0.5em; margin-bottom: 0 }
p.mathjax-block,.mathjax-block { padding-bottom: 0 }
.mathjax-block>.code-tooltip { bottom: 5px }
.md-image>.md-meta { padding-left: 0.5em; padding-right: 0.5em }
.md-image>img { margin-top: 2px }
.md-image>.md-meta:first-of-type::before { padding-left: 4px }
#typora-source { color: #555 }
#md-searchpanel { border-bottom: 1px solid #ccc }
#md-searchpanel .btn { border: 1px solid #ccc }
#md-notification::before { top: 14px }
#md-notification { background: #eee }
.megamenu-menu-panel .btn { border: 1px solid #ccc }
#typora-sidebar { }
.file-list-item,.show-folder-name .file-list-item { padding-top: 20px; padding-bottom: 20px; line-height: 20px }
.file-list-item-summary { height: 40px; line-height: 20px }

决策树模型与学习

1. 定义

一般的，一棵决策树包含一个根结点，若干个内部结点和若干个叶结点；叶结点对应于决策结果，其他每个结点则对应于一个属性测试；每个结点包含的样本集合根据属性测试的结果被划分到子结点中；跟结点包含样本全集，从跟结点到每个叶结点的路径对应一个判定测试序列。

上面的定义可能有些抽象，我随意画了一个图来加深理解；

可以看出，该决策树按照性别、身高、厨艺来构建，对其属性的测试，在最终的叶结点决定相亲对象是否符合男生相亲的标准。

2. 基本算法

决策树的算法通常是一个递归的选择最优特征，并根据该特征对训练数据进行分割，使得各个子数据有一个最好的分类过程，这一过程对应着特征空间的划分，也对应着决策树的构建。目前常见的算法包括 CART (Classification And Regression Tree)、ID3、C4.5、随机森林 (Random Forest) 等。

开始先构建根节点，将所有训练数据都放在根节点；
选择一个最优特征，按照这一特征将训练数据分割成子集，使得各个子集有一个在当前条件下最好的分类；
如果这些子集已经能够被基本正确分类，那么构建叶结点，并将这些子集分到所对应的叶结点中去；
如果还有子集不能被基本正确分类，那么就对这些子集选择新的最优特征，继续对其进行分割，构建相应的结点；
如此的递归下去，直至所有的训练数据子集被基本正确分类，或者没有合适的特征为止。最后每个子集都被分到叶结点上，即都有了明确的分类，这就生成了一颗决策树。

3. 划分选择

决策树算法的重点在上图的第八行：“选择最优划分属性”，也就是如何选择最优划分属性，一般而言，随着划分的层级不断增高，我们希望决策树的分支结点所包含的样本尽可能的在同一个类别（否则专门为很少数的样本创建一个分支结点没啥意义），也就是其的“纯度”越来越高。

而信息熵则是专门用来表示样本纯度的指标，假定p(xi)表示在当前集合x中第 i 类样本所占的比例为p(xi)，公式如下：

其中，熵的值越小，集合x的纯度就越高；熵的值越大，随机变量的不确定性就越大。

其中，t是T划分之后的集合，考虑到不同的分支结点所包含的样本数目不同，给分支结点赋予权重p(t)，即样本数目越多的分支结点的影响就越大。一般而言，信息增益越大，则意味着使用该属性划分所获得的纯度提升就越大（因为H是不变的，所以信息熵就得越小，则纯度就越高），因此，我们可用信息增益来进行决策树的划分属性选择。

原文地址：https://www.cnblogs.com/George1994/p/8543420.html

时间： 2024-10-10 09:04:05

决策树模型与学习《一》

决策树模型与学习

1. 定义

2. 基本算法

3. 划分选择

决策树模型与学习《一》的相关文章

机器学习中---分类模型--决策树模型

隐马尔可夫模型（七）——隐马尔可夫模型的学习问题(前向后向算法）（转载）

机器学习之模型评估与模型选择(学习笔记)

决策树模型(matlab)

决策树到集成学习

机器学习 —— 概率图模型（学习：非完整数据）

【转载】决策树Decision Tree学习

机器学习 —— 概率图模型（学习：综述）

话题模型-LDA学习