文章翻译第七章10-12

10  Measuring prediction performance using

测量预测性能

A receiver operating characteristic (ROC) curve is a plot that illustrates the performance of a binary classifier system, and plots the true positive rate against the false positive rate for different cut points. We most commonly use this plot to calculate the area under curve (AUC) to measure the performance of a classification model. In this recipe, we will demonstrate how to illustrate an ROC curve and calculate the AUC to measure the performance of a classification model.受试者工作特征(ROC)曲线是一个图,示出了二进制分类器系统的性能,并绘制真正的阳性率对不同切割点的假阳性率。我们通常使用这个图来计算曲线下面积(AUC)来衡量分类模型的性能。在这个食谱中,我们将演示如何说明一个ROC曲线和计算AUC来衡量分类模型的性能。

Getting ready准备

In this recipe, we will continue using the telecom churn dataset as our example dataset.在这个食谱中,我们将继续使用电信流失数据集作为我们的示例数据集。

How to do it...怎么做

Perform the following steps to generate two different classification examples with

different costs:执行下列步骤以生成两个不同的分类示例不同的成本:

1.     First, you should install and load the ROCR package:首先,你应该安装并加载使包

> install.packages("ROCR")

> library(ROCR)

  

2.     Train the svm model using the training dataset with a probability equal to TRUE:训练SVM模型使用的训练数据集的概率等于真

> svmfit=svm(churn~ ., data=trainset, prob=TRUE)

  

3.     Make predictions based on the trained model on the testing dataset with the

probability set as TRUE:预测的基础上受过训练的模型的测试数据集与概率集为真:

>pred=predict(svmfit,testset[, !names(testset) %in% c("churn")],

probability=TRUE)

  

4.     Obtain the probability of labels with yes:得到标签的概率是:

> pred.prob = attr(pred, "probabilities")

> pred.to.roc = pred.prob[, 2]

  

5.     Use the prediction function to generate a prediction result:使用预测函数生成预测结果:

> pred.rocr = prediction(pred.to.roc, testset$churn)

  

6.     Use the performance function to obtain the performance measurement:使用性能函数获得性能测量:

> perf.rocr = performance(pred.rocr, measure = "auc", x.measure =

"cutoff")

> perf.tpr.rocr = performance(pred.rocr, "tpr","fpr")

  

7.     Visualize the ROC curve using the plot function:利用图函数可视化ROC曲线:

> plot(perf.tpr.rocr, colorize=T,main=paste("AUC:",([email protected]

values)))

  

Figure 6: The ROC curve for the svm classifier performance支持向量机分类器性能的ROC曲线

How it works...怎么做

In this recipe, we demonstrated how to generate an ROC curve to illustrate the performance of a binary classifier. First, we should install and load the library, ROCR. Then, we use svm, from the e1071 package, to train a classification model, and then use the model to predict labels for the testing dataset. Next, we use the prediction functio(from the package, ROCR) to generate prediction results. We then adapt the performance function to obtain theperformance measurement of the true positive rate against the false positive rate. Finally, we use the plot function to visualize the ROC plot, and add the value of AUC on the title. In this example, the AUC value is 0.92, which indicates that the svm classifier performs well in classifying telecom user churn datasets.在这个配方中,我们演示了如何生成一个ROC曲线来说明性能的二进制分类器。首先,我们应该安装和加载库,ROCR。然后,我们使用支持向量机,从e1071包,训练分类模型,然后使用模型预测的测试数据集的标签。接下来,我们使用的预测功能(从包装,使生成的预测结果)。然后,我们适应的性能函数,得到真正的阳性率对假阳性率的性能测量。最后,我们使用的情节功能可视化的ROC图,并添加值的AUC的标题。在这个例子中,AUC值为0.92,这表明,SVM分类器进行分类以及电信用户流失数据集。

See also参见

ff For those interested in the concept and terminology of ROC, you can refer to FF对于那些感兴趣的概念和术语的ROC,可以参考

http://en.wikipedia.org/wiki/Receiver_operating_characteristic

11Comparing an ROC curve using the caret package使用插入符号包ROC曲线比较

In previous chapters, we introduced many classification methods; each method has its own advantages and disadvantages. However, when it comes to the problem of how to choose the best fitted model, you need to compare all the performance measures generated from different prediction models. To make the comparison easy, the caret package allows us to generate and compare the performance of models. In this recipe, we will use the function provided by the caret package to compare different algorithm trained models on the same dataset.在前面的章节中,我们介绍了许多分类方法,每种方法都有自己的优点和缺点。然而,当谈到如何选择最佳拟合模型的问题,你需要比较不同的预测模型所产生的所有性能指标。为了使比较容易,插入包允许我们生成和比较模型的性能。在这个食谱中,我们将使用由符号打包提供比较不同算法训练模型在同一数据库的功能

Getting ready准备

Here, we will continue to use telecom dataset as our input data source.在这里,我们将继续使用电信数据集作为我们的输入数据源。

How to do it...怎么做

Perform the following steps to generate an ROC curve of each fitted model:执行下列步骤来生成每个拟合模型的ROC曲线

1.     Install and load the library, pROC:安装和加载库

> install.packages("pROC")

> library("pROC")

  

2.     Set up the training control with a 10-fold cross-validation in 3 repetitions:建立训练控制与10倍交叉验证在3次重复

> control = trainControl(method = "repeatedcv",

+                            number = 10,

+                            repeats = 3,

+                            classProbs = TRUE,

+                            summaryFunction = twoClassSummary)

  

3.     Then, you can train a classifier on the training dataset using glm:然后,你可以训练一个分类器的训练数据集使用GLM

> glm.model= train(churn ~ .,

+                     data = trainset,

  

1.     Resample the three generated models:

重采样三生成的模型:

> cv.values = resamples(list(glm = glm.model, svm=svm.model, rpart

= rpart.model))

  

2.     Then, you can obtain a summary of the resampling result:

然后,可以获取重采样结果的摘要:

> summary
  

Call:

summary.resamples(object = cv.values)

Models: glm, svm, rpart

Number of resamples: 30

ROC

        Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA‘s

glm   0.7206  0.7847 0.8126 0.8116  0.8371 0.8877    0

svm   0.8337  0.8673 0.8946 0.8929  0.9194 0.9458    0

rpart 0.2802  0.7159 0.7413 0.6769  0.8105 0.8821    0

Sens

         Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA‘s

glm   0.08824  0.2000 0.2286 0.2194  0.2517 0.3529    0

svm   0.44120  0.5368 0.5714 0.5866  0.6424 0.7143    0

rpart 0.20590  0.3742 0.4706 0.4745  0.5929 0.6471    0

Spec

        Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA‘s

glm   0.9442  0.9608 0.9746 0.9701  0.9797 0.9949    0

svm   0.9442  0.9646 0.9746 0.9740  0.9835 0.9949    0

rpart 0.9492  0.9709 0.9797 0.9780  0.9848 0.9949    0

  

3.     Use dotplot to plot the resampling result in the ROC metric:

使用dotplot在ROC度量采样结果图

> dotplot(cv.values, metric = "ROC")

  

4.     Also, you can use a box-whisker plot to plot the resampling result:

此外,您可以使用一个方块图绘制重采样结果

> bwplot(cv.values, layout = c(3, 1))

  

How it works...它如何工作

In this recipe, we demonstrate how to measure the performance differences among three fitted models using the resampling method. First, we use the resample function to generate the statistics of each fitted model (svm.model, glm.model, and rpart.model). Then, we can use the summary function to obtain the statistics of these three models in the ROC, sensitivity and specificity metrics. Next, we can apply a dotplot on the resampling result to see how ROC varied between each model. Last, we use a box-whisker plot on the resampling results to show the box-whisker plot of different models in the ROC, sensitivity and specificity metrics on a single plot.

在这个食谱中,我们展示了如何衡量三个拟合模型的性能差异使用重采样方法。首先,我们使用重采样函数生成各拟合模型的统计(svm.model,glm.model,和rpart。模型)。然后,我们可以使用汇总功能,以获得这三个模型在ROC的统计,敏感性和特异性度量。接下来,我们可以应用在重采样的结果怎么看ROC dotplot之间变化,每个模型。最后,我们使用的重采样结果显示不同的模型在ROC,灵敏度和特异性指标在一个单一的地块盒晶须图的盒晶须情节。

See also参见

ff Besides using dotplot and bwplot to measure performance differences, one can use densityplot, splom, and xyplot to visualize the performance differences of each fitted model in the ROC, sensitivity, and specificity metrics.FF除了使用dotplot和bwplot测量性能的差异,可以使用splom densityplot,可视化,和xyplot在ROC,各拟合模型的性能差异的敏感性和特异性的指标。

12   Measuring performance differences between models with the caret package测量性能差异并封装模型

In the previous recipe, we introduced how to generate ROC curves for each generated model, and have the curve plotted on the same figure. Apart from using an ROC curve, one can use the resampling method to generate statistics of each fitted model in ROC, sensitivity and specificity metrics. Therefore, we can use these statistics to compare the performance differences between each model. In the following recipe, we will introduce how to measure performance differences between fitted models with the caret package.在以前的配方中,我们介绍了如何生成ROC曲线生成的模型,并将曲线绘制在同一图形上。除了使用ROC曲线,可以使用重采样的方法来生成统计的每个拟合模型在ROC,灵敏度特异性度量。因此,我们可以使用这些统计数据来比较性能各模型之间的差异。在下面的食谱中,我们将介绍如何测量之间的拟合模型并封装性能的差异。

Getting ready准备

One needs to have completed the previous recipe by storing the glm fitted model, svm fitted model, and the rpart fitted model into glm.model, svm.model, and rpart.model, respectively人们需要通过存储GLM拟合模型完成之前的食谱,支持向量机的安装<br>模型,并为glm.model rpart拟合模型,svm.model,和rpart.model

How to do it...怎样做…

Perform the following steps to measure performance differences between each fitted model:执行下列步骤来测量每个拟合模型之间的性能差异:                --------摘自百度翻译

叶新颍

时间: 2024-10-24 21:29:10

文章翻译第七章10-12的相关文章

文章翻译第七章1-3

1   Estimating model performance with k-fold cross-validation 估计与交叉验证模型的性能 The k-fold cross-validation technique is a common technique used to estimate the performance of a classifier as it overcomes the problem of over-fitting. For k-fold cross?vali

文章翻译第七章4-6

4  Performing cross-validation with the caret package 并包装卡雷特进行交叉验证 caret packageThe Caret (classification and regression training) package contains many functions in regard to the training process for regression and classification problems. Similar t

文章翻译第七章7-9

7.Selecting features using the caret package 使用插入符号包装特征选择 The feature selection method searches the subset of features with minimized predictive errors. We can apply feature selection to identify which attributes are required to build an accurate mod

文章翻译第六章1-3

Classification (II) –Neural Network and SVM 分类-神经网络与支持向量机 Introduction Most research has shown that support vector machines (SVM) and neural networks (NN) are powerful classification tools, which can be applied to several different areas. Unlike tree

3.28日第七次作业12章沟通管理13章合同管理

3.28日第七次作业12章沟通管理13章合同管理   第12章.项目沟通管理   1.项目沟通管理包括哪些过程?(记)P349 答:1).沟通计划编制 2).信息分发 3).绩效报告 4).项目干系人管理 2.阻碍有效沟通的因素有哪些?P351-352 答:1).沟通双方的物理距离 2).沟通的环境因素 3).缺乏清晰的沟通渠道 4).复杂的组织结构 5).复杂的技术术语 6).有害的态度 3.沟通计划编制的第一步是什么?目的是什么?P353 答:沟通计划编制的第一步是干系人分析.其目的是得出项

[书籍翻译] 《JavaScript并发编程》第七章 抽取并发逻辑

本文是我翻译<JavaScript Concurrency>书籍的第七章 抽取并发逻辑,该书主要以Promises.Generator.Web workers等技术来讲解JavaScript并发编程方面的实践. 完整书籍翻译地址:https://github.com/yzsunlei/javascript_concurrency_translation .由于能力有限,肯定存在翻译不清楚甚至翻译错误的地方,欢迎朋友们提issue指出,感谢. 到本书这里,我们已经在代码中明确地模拟了并发问题.使

第七章、特殊兴趣

目录 第七章.特殊兴趣 一.特殊兴趣的发展 二.特殊兴趣的类型 (一)收集物品 (二)积累知识和专长 三.女孩和女性的兴趣 四.特殊兴趣的功能 (一)克服焦虑感 (二)快乐的源泉 (三)放松心情的手段 (四)建立生活整体感 (五)认识自然界 (六)创造一个替代世界 (七)获得认同感 (八)消磨时间.促进交谈和展示智力 五.父母的观点 六.临床工作人员的观点 (一)减少.去除或有效利用特殊兴趣 (二)控制接触机会 (三)转换或去除不被接受的兴趣 (四)建设性的应用 七.学习什么时候可以谈论特殊兴趣

Android开发艺术探索——第七章:Android动画深入分析

Android开发艺术探索--第七章:Android动画深入分析 Android的动画可以分成三种,view动画,帧动画,还有属性动画,其实帧动画也是属于view动画的一种,,只不过他和传统的平移之类的动画不太一样的是表现形式上有点不一样,view动画是通过对场景的不断图像交换而产生的动画效果,而帧动画就是播放一大段图片,很显然,图片多了会OOM,属性动画通过动态的改变对象的属性达到动画效果,也是api11的新特性,在低版本无法使用属性动画,但是我们依旧有一些兼容库,OK,我们还是继续来看下详细

《深入理解计算机系统》读书笔记 第七章 链接

第七章链接 链接(linking)是将各种代码和数据部分收集起来并组合成为一个单一文件的过程,这个文件可被加载(或被拷贝)到存储并执行. 链接的时机 编译时,也就是在源代码被翻译成机器代码时 加载时,也就是在程序被加载器加载到存储器并执行时. 运行时,由应用程序执行. 在现代系统中,链接是由链接器自动执行的. 链接器的关键角色:使分离编译称为可能. 7.1 编译器驱动程序 驱动程序的工作:1.运行C预处理器,将C源程序(.c)翻译成一个ASCⅡ码中间文件(.i):2.运行C编译器,将.i文件翻译