XGBOOST使用指南

XGBOOST使用指南

    <div id="article_content" class="article_content clearfix csdn-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post">
                    <div class="htmledit_views">
            <h2 style="font-family:‘Helvetica Neue‘, Helvetica, Arial, sans-serif;line-height:1;color:rgb(0,0,0);"><a name="t0"></a><span style="font-size:16px;">一、导入必要的工具包</span></h2><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># 导入必要的工具包</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">import</span> xgboost <span class="hljs-keyword">as</span> xgb</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># 计算分类正确率</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><span style="font-size:16px;">二、<span style="color:rgb(0,0,0);font-family:‘Helvetica Neue‘, Helvetica, Arial, sans-serif;"><strong>数据读取</strong></span></span><br>XGBoost可以加载libsvm格式的文本数据,libsvm的文件格式(稀疏特征)如下:<br><span style="color:#ff0000;"><strong>1</strong></span>&nbsp;&nbsp;<strong>101</strong>:1.2 <strong>102</strong>:0.03<br><span style="color:#cc0000;"><strong>0&nbsp;</strong></span> <strong>1</strong>:2.1 <strong>10001</strong>:300 <strong>10002</strong>:400<br>...<br>每一行表示一个样本,第一行的开头的<strong>“1”是样本的标签</strong>。<strong>“101”和“102”为特征索引,</strong><span style="color:#cc0000;"><strong>‘1.2‘和‘0.03‘ 为特征的值。</strong></span><br><p>在两类分类中,用<strong>“1”</strong>表示正样本,用<strong>“0”</strong> 表示负样本。也支持[0,1]表示概率用来做标签,表示为正样本的概率。</p>下面的示例数据需要我们通过一些蘑菇的若干属性判断这个品种是否有毒。<br>UCI数据描述:http://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/ ,<br>每个样本描述了蘑菇的22个属性,比如形状、气味等等(<strong>将22维原始特征用加工后变成了126维特征,<br></strong><p><strong>并存为libsvm格式</strong>),然后给出了这个蘑菇是否可食用。其中6513个样本做训练,1611个样本做测试。</p><p><strong><span style="color:#cc0000;">注:libsvm格式文件说明如下</span>&nbsp;<a href="https://www.cnblogs.com/codingmengmeng/p/6254325.html" rel="nofollow" target="_blank">https://www.cnblogs.com/codingmengmeng/p/6254325.html</a></strong></p>XGBoost加载的数据存储在对象DMatrix中<br>XGBoost自定义了一个<strong><span style="color:#cc0000;">数据矩阵类DMatrix</span></strong>,优化了存储和运算速度<br><p>DMatrix文档:http://xgboost.readthedocs.io/en/latest/python/python_api.html</p><p>数据下载地址:<a href="http://download.csdn.net/download/u011630575/10266113" rel="nofollow" target="_blank">http://download.csdn.net/download/u011630575/10266113</a></p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># read in data,数据在xgboost安装的路径下的demo目录,现在我们将其copy到当前代码下的data目录</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">my_workpath = <span class="hljs-string">‘./data/‘</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtrain = xgb.DMatrix(my_workpath + <span class="hljs-string">‘agaricus.txt.train‘</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtest = xgb.DMatrix(my_workpath + <span class="hljs-string">‘agaricus.txt.test‘</span>)</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p><span style="color:rgb(0,0,0);font-family:‘Helvetica Neue‘, Helvetica, Arial, sans-serif;font-size:14px;text-align:left;background-color:rgb(255,255,255);">查看数据情况</span></p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs">dtrain.num_col()</code><div class="hljs-button" data-title="复制"></div></pre><pre onclick="hljs.copyCode(event)"><code class="language-python hljs">dtrain.num_row()</code><div class="hljs-button" data-title="复制"></div></pre><pre onclick="hljs.copyCode(event)"><code class="language-python hljs">dtest.num_row()</code><div class="hljs-button" data-title="复制"></div></pre><span style="font-size:16px;">三、训练参数设置</span><br><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">max_depth: 树的最大深度。缺省值为<span class="hljs-number">6</span>,取值范围为:[<span class="hljs-number">1</span>,∞]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">eta:为了防止过拟合,更新过程中用到的收缩步长。在每次提升计算之后,算法会直接获得新特征的权重。 </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">eta通过缩减特征的权重使提升计算过程更加保守。缺省值为<span class="hljs-number">0.3</span>,取值范围为:[<span class="hljs-number">0</span>,<span class="hljs-number">1</span>]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">silent:取<span class="hljs-number">0</span>时表示打印出运行时信息,取<span class="hljs-number">1</span>时表示以缄默方式运行,不打印运行时信息。缺省值为<span class="hljs-number">0</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">objective: 定义学习任务及相应的学习目标,“binary:logistic” 表示二分类的逻辑回归问题,输出为概率。</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">其他参数取默认值。</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># specify parameters via map</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">param = {<span class="hljs-string">‘max_depth‘</span>:<span class="hljs-number">2</span>, <span class="hljs-string">‘eta‘</span>:<span class="hljs-number">1</span>, <span class="hljs-string">‘silent‘</span>:<span class="hljs-number">0</span>, <span class="hljs-string">‘objective‘</span>:<span class="hljs-string">‘binary:logistic‘</span> }</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">print(param)</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p><span style="font-size:16px;">四、训练模型</span></p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># 设置boosting迭代计算次数</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">num_round = <span class="hljs-number">2</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">import</span> time</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">starttime = time.clock()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">bst = xgb.train(param, dtrain, num_round) <span class="hljs-comment">#  dtrain是训练数据集</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">endtime = time.clock()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (endtime - starttime)</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre>XGBoost预测的输出是概率。这里蘑菇分类是一个二类分类问题,<strong><span style="color:#cc0000;">输出值是样本为第一类的概率。</span></strong><br><p><span style="color:#ff0000;">我们需要将概率值转换为0或1。</span></p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">train_preds = bst.predict(dtrain)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">train_predictions = [round(value) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> train_preds]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">y_train = dtrain.get_label() <span class="hljs-comment">#值为输入数据的第一行</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">train_accuracy = accuracy_score(y_train, train_predictions)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (<span class="hljs-string">"Train Accuary: %.2f%%"</span> % (train_accuracy * <span class="hljs-number">100.0</span>))</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p><span style="font-size:16px;">五、测试</span></p><p>模型训练好后,可以用训练好的模型对测试数据进行预测<br></p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># make prediction</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">preds = bst.predict(dtest)</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre>检查模型在测试集上的正确率<br><p>XGBoost预测的输出是概率,输出值是样本为第一类的概率。我们需要将概率值转换为0或1。</p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs">predictions = [round(value) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> preds]</code><div class="hljs-button" data-title="复制"></div></pre><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">y_test = dtest.get_label()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">test_accuracy = accuracy_score(y_test, predictions)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">print(<span class="hljs-string">"Test Accuracy: %.2f%%"</span> % (test_accuracy * <span class="hljs-number">100.0</span>))</div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><h2 style="font-family:‘Helvetica Neue‘, Helvetica, Arial, sans-serif;line-height:1;color:rgb(0,0,0);"><a name="t1"></a><span style="font-weight:normal;"><span style="font-size:16px;">六、模型可视化</span></span></h2><span style="font-size:14px;">调用XGBoost工具包中的plot_tree,在显示<br>要可视化模型需要安装graphviz软件包<br>plot_tree()的三个参数:<br>1. 模型<br>2. 树的索引,从0开始<br>3. 显示方向,缺省为竖直,‘LR‘是水平方向</span><br><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">from</span> matplotlib <span class="hljs-keyword">import</span> pyplot</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">import</span> graphviz</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">xgb.plot_tree(bst, num_trees=<span class="hljs-number">0</span>, rankdir= <span class="hljs-string">‘LR‘</span> )</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">pyplot.show()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">#xgb.plot_tree(bst,num_trees=1, rankdir= ‘LR‘ )</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">#pyplot.show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">#xgb.to_graphviz(bst,num_trees=0)</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">#xgb.to_graphviz(bst,num_trees=1)</span></div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p>七、代码整理</p><pre onclick="hljs.copyCode(event)"><code class="language-python hljs"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># coding:utf-8</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">import</span> xgboost <span class="hljs-keyword">as</span> xgb</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># 计算分类正确率</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># read in data,数据在xgboost安装的路径下的demo目录,现在我们将其copy到当前代码下的data目录</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">my_workpath = <span class="hljs-string">‘./data/‘</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtrain = xgb.DMatrix(my_workpath + <span class="hljs-string">‘agaricus.txt.train‘</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtest = xgb.DMatrix(my_workpath + <span class="hljs-string">‘agaricus.txt.test‘</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtrain.num_col()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="13"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="14"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtrain.num_row()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="15"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="16"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">dtest.num_row()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="17"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="18"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># specify parameters via map</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="19"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">param = {<span class="hljs-string">‘max_depth‘</span>:<span class="hljs-number">2</span>, <span class="hljs-string">‘eta‘</span>:<span class="hljs-number">1</span>, <span class="hljs-string">‘silent‘</span>:<span class="hljs-number">0</span>, <span class="hljs-string">‘objective‘</span>:<span class="hljs-string">‘binary:logistic‘</span> }</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="20"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">print(param)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="21"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="22"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># 设置boosting迭代计算次数</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="23"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">num_round = <span class="hljs-number">2</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="24"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="25"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">import</span> time</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="26"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="27"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">starttime = time.clock()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="28"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="29"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">bst = xgb.train(param, dtrain, num_round)  <span class="hljs-comment"># dtrain是训练数据集</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="30"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="31"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">endtime = time.clock()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="32"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (endtime - starttime)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="33"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="34"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="35"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">train_preds = bst.predict(dtrain)    <span class="hljs-comment">#</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="36"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (<span class="hljs-string">"train_preds"</span>,train_preds)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="37"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="38"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">train_predictions = [round(value) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> train_preds]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="39"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (<span class="hljs-string">"train_predictions"</span>,train_predictions)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="40"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="41"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">y_train = dtrain.get_label()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="42"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (<span class="hljs-string">"y_train"</span>,y_train)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="43"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="44"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">train_accuracy = accuracy_score(y_train, train_predictions)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="45"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">print</span> (<span class="hljs-string">"Train Accuary: %.2f%%"</span> % (train_accuracy * <span class="hljs-number">100.0</span>))</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="46"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="47"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="48"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># make prediction</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="49"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">preds = bst.predict(dtest)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="50"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">predictions = [round(value) <span class="hljs-keyword">for</span> value <span class="hljs-keyword">in</span> preds]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="51"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="52"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">y_test = dtest.get_label()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="53"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="54"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">test_accuracy = accuracy_score(y_test, predictions)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="55"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">print(<span class="hljs-string">"Test Accuracy: %.2f%%"</span> % (test_accuracy * <span class="hljs-number">100.0</span>))</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="56"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="57"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># from matplotlib import pyplot</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="58"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># import graphviz</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="59"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="60"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">import</span> graphviz</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="61"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="62"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># xgb.plot_tree(bst, num_trees=0, rankdir=‘LR‘)</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="63"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># pyplot.show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="64"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="65"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># xgb.plot_tree(bst,num_trees=1, rankdir= ‘LR‘ )</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="66"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># pyplot.show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="67"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># xgb.to_graphviz(bst,num_trees=0)</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="68"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment"># xgb.to_graphviz(bst,num_trees=1)</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="69"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p><br></p><p><br></p>            </div>
            </div>

原文地址:https://www.cnblogs.com/guanzhicheng/p/9419582.html

时间: 2024-11-08 20:48:18

XGBOOST使用指南的相关文章

XGBoost参数调优完全指南(附Python代码)

XGBoost参数调优完全指南(附Python代码):http://www.2cto.com/kf/201607/528771.html https://www.zhihu.com/question/41354392 [以下转自知乎] https://www.zhihu.com/question/45487317 为什么xgboost/gbdt在调参时为什么树的深度很少就能达到很高的精度? 参加kaggle的时候,用xgboost/gbdt在在调参的时候把树的最大深度调成6就有很高的精度了.但是

【转】XGBoost参数调优完全指南(附Python代码)

xgboost入门非常经典的材料,虽然读起来比较吃力,但是会有很大的帮助: 英文原文链接:https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/ 原文地址:Complete Guide to Parameter Tuning in XGBoost (with codes in Python) 译注:文内提供的代码和运行结果有一定差异,可以从这里下

xgboost 参数调优指南

一.XGBoost的优势 XGBoost算法可以给预测模型带来能力的提升.当我对它的表现有更多了解的时候,当我对它的高准确率背后的原理有更多了解的时候,我发现它具有很多优势: 1 正则化 标准GBDT 的实现没有像XGBoost这样的正则化步骤.正则化对减少过拟合也是有帮助的. 实际上,XGBoost以"正则化提升(regularized boosting)"技术而闻名. 2 并行处理 XGBoost可以实现并行处理,相比GBDT有了速度的飞跃. 不过,众所周知,Boosting算法是

机器学习系列(12)_XGBoost参数调优完全指南(附Python代码)

https://blog.csdn.net/han_xiaoyang/article/details/52665396 转: 原文地址:Complete Guide to Parameter Tuning in XGBoost by Aarshay Jain 原文翻译与校对:@MOLLY && 寒小阳 ([email protected]) 时间:2016年9月. 出处:http://blog.csdn.net/han_xiaoyang/article/details/52665396 声

最全解析如何正确学习JavaScript指南,必看!

划重点 鉴于时不时,有同学私信问我:怎么学前端的问题.这里统一回复一下,如下次再遇到问我此问题同学,就直接把本文链接地址发给你了. "前端怎么学"应该因人而异,别人的方法未必适合自己.就说说我的学习方法吧:我把大部分时间放在学习js上了.因为这个js的学习曲线,先平后陡.项目实践和练习啥的,我不说了,主要说下工作之外的时间利用问题.我是怎么学的呢,看书,分析源码.个人这几天统计了一下,前端书籍目前看了50多本吧,大部分都是js的.市面上的书基本,差不多都看过. 第一个问题:看书有啥好处

微信Android接入指南

注:本文为微信Android终端开发工具的新手使用教程,只涉及教授SDK的使用方法,默认读者已经熟悉IDE的基本使用方法(本文以Eclipse为例),以及具有一定的编程知识基础等. 1.申请你的AppID 请到 开发者应用登记页面 进行登记,登记并选择移动应用进行设置后,将该应用提交审核,只有审核通过的应用才能进行开发. 2.下载微信终端开发工具包 开发工具包主要包含3部分内容:(其中,只有libammsdk.jar是必须的) - libammsdk.jar(每个第三方应用必须要导入该sdk库,

【资源共享】Rockchip I2C 开发指南 V1.0

2C设备的设备应用非常广泛,常见的包含重力传感器,触摸屏驱动芯片,音频解码等 这个文档是RK3399的I2C开发文档:<Rockchip I2C 开发指南 V1.0> 内容预览: 下载地址:http://developer.t-firefly.com/thread-12495-1-1.html

Spark性能优化指南——高级篇

Spark性能优化指南--高级篇 [TOC] 前言 继基础篇讲解了每个Spark开发人员都必须熟知的开发调优与资源调优之后,本文作为<Spark性能优化指南>的高级篇,将深入分析数据倾斜调优与shuffle调优,以解决更加棘手的性能问题. 数据倾斜调优 调优概述 有的时候,我们可能会遇到大数据计算中一个最棘手的问题--数据倾斜,此时Spark作业的性能会比期望差很多.数据倾斜调优,就是使用各种技术方案解决不同类型的数据倾斜问题,以保证Spark作业的性能. 数据倾斜发生时的现象 绝大多数tas

&quot;WannaCry&quot;勒索病毒用户处置指南

"WannaCry"勒索病毒用户处置指南 原文: http://mp.weixin.qq.com/s/ExsribKum9-AN1ToT10Zog 前言:北京时间2017年5月12日晚,勒索软件"WannaCry"感染事件爆发,全球范围内99个国家遭到大规模网络攻击,被攻击者电脑中的文件被加密,被要求支付比特币以解密文件:众多行业受到影响,比如英国的NHS服务,导致至少40家医疗机构内网被攻陷,电脑被加密勒索:而我国众多行业也是如此,其中又以教育网最为显著,导致部分