文章翻译第七章4-6

4  Performing cross-validation with the caret package

并包装卡雷特进行交叉验证

caret packageThe Caret (classification and regression training) package contains many functions in regard to the training process for regression and classification problems. Similar to the e1071 package, it also contains a function to perform the k-fold cross validation. In this recipe, we will demonstrate how to the perform k-fold cross validation using the caret package.

(分类和回归训练)包中所包含的关于回归和分类问题的训练过程中的许多功能。类似的e1071包,它还包含了一个函数来实现交叉验证。在这个食谱中,我们将演示如何执行交叉验证使用

Getting ready

准备

In this recipe, we will continue to use the telecom churn dataset as the input data source to perform the k-fold cross validation

在这个食谱中,我们将继续使用电信客户流失数据集作为输入数据源进行交叉验证

How to do it...

怎么做

Perform the following steps to perform the k-fold cross-validation with the caret package:执行以下步骤并封装进行交叉验证:

1. First, set up the control parameter to train with the 10-fold cross validation in 3

repetitions:

> control = trainControl(method="repeatedcv", number=10,

 repeats=3)

2. Then, you can train the classification model on telecom churn data with rpart:

2。然后,你可以训练分类模型对电信客户流失数据:

> model = train(churn~., data=trainset, method="rpart",

preProcess="scale", trControl=control)

3. Finally, you can examine the output of the generated model:

3。最后,您可以检查生成的模型的输出:

> model

 CART

 2315 samples

16 predictor

2 classes: ‘yes‘, ‘no‘

Pre-processing: scaled

Resampling: Cross-Validated (10 fold, repeated 3 times)

Summary of sample sizes: 2084, 2083, 2082, 2084, 2083, 2084, ...

Resampling results across tuning parameters:

cp Accuracy Kappa Accuracy SD Kappa SD

0.0556 0.904 0.531 0.0236 0.155

0.0746 0.867 0.269 0.0153 0.153

0.0760 0.860 0.212 0.0107 0.141

Accuracy was used to select the optimal model using the largest value.

The final value used for the model was cp = 0.05555556.

How it works...

如何工作

In this recipe, we demonstrate how convenient it is to conduct the k-fold cross-validation using the caret package. In the first step, we set up the training control and select the option to perform the 10-fold cross-validation in three repetitions. The process of repeating the k-fold validation is called repeated k-fold validation, which is used to test the stability of the model. If

the model is stable, one should get a similar test result. Then, we apply rpart on the training dataset with the option to scale the data and to train the model with the options configured in

the previous step.

它如何工作…在这个食谱中,我们演示了使用符号包进行交叉验证是如何方便的。在第一步中,我们设置了训练控制,并选择选项执行10倍交叉验证在三次重复。重复折验证的过程称为重复折验证,这是用来测试的稳定性

See also

f You can configure the resampling function in trainControl, in which you can

specify boot, boot632, cv, repeatedcv, LOOCV, LGOCV, none, oob, adaptive_

cv, adaptive_boot, or adaptive_LGOCV. To view more detailed information of

how to choose the resampling method, view the trainControl document:

> ?trainControl

又见F可以控制、配置重采样功能,您可以在其中指定启动,boot632,CV,repeatedcv,LOOCV,lgocv,无,OOB,adaptive_ CV,adaptive_boot,或adaptive_lgocv。查看更详细的信息,如何选择重采样方法,查看控制、文档:>?列控

5.

Ranking the variable importance with the rminer package

排序的变量的重要性与rminer

Besides using the caret package to generate variable importance, you can use the rminerpackage to generate the variable importance of a classification model. In the following recipe, we will illustrate how to use rminer to obtain the variable importance of a fitted model.Getting readyIn this recipe, we will continue to use the telecom churn dataset as the input data source to rank the va

除了使用符号包产生变量的重要性,你可以使用rminer包产生一个分类模型的变量的重要性。在下面的食谱,我们将说明如何使用rminer获得拟合模型的变量的重要性。准备在这个食谱中,我们将继续使用的电信流失数据集作为输入数据源排名VA

How to do it...Perform the following steps to rank the variable importance with rminer:1. Install and load the package, rminer:> install.packages("rminer")> library(rminer)2. Fit the svm model with the training set:> model=fit(churn~.,trainset,model="svm")3. Use the Importance function to obtain the variable importance:> VariableImportance=Importance(model,trainset,method="sensv")4. Plot the variable importance ranked by the variance:> L=list(runs=1,sen=t(VariableImportance$imp),sresponses=VariableImportance$sresponses)> mgraph(L,graph="IMP",leg=names(trainset),col="gray",Grid=10)Figure 2: The visualization of variable importance using the rminer package

Similar to the caret package, the rminer package can also generate the variable importance of a classification model. In this recipe, we first train the svm model on the training dataset, trainset, with the fit function. Then, we use the Importance function to rank the variable importance with a sensitivity measure. Finally, we use mgraph to plot the rank of the variable importance. Simila

类似于符号的rminer包,包也可以产生一个分类模型的变量的重要性。在这个食谱中,我们首先训练SVM模型的训练数据集,动车组,与拟合函数。然后,我们使用的重要性功能排名的变量重要性的敏感性措施。最后,我们使用MGraph绘制变量重要性排序。

6.Finding highly correlated features with the caret package

寻找高度相关的特征并包装

When performing regression or classification, some models perform better if highly correlated attributes are removed. The caret package provides the findCorrelation function, which can be used to find attributes that are highly correlated to each other. In this recipe, we will demonstrate how to find highly correlated features using the caret package.

当进行回归或分类,一些模型表现更好,如果高度相关的属性被删除。插入findcorrelation包提供的功能,它可以用来发现是彼此高度相关的属性。在这个食谱中,我们将展示如何找到高度相关的特征,用符号包。

How to do it...Perform the following steps to find highly correlated attributes:1. Remove the features that are not coded in numeric characters:> new_train = trainset[,! names(churnTrain) %in% c("churn", "international_plan", "voice_mail_plan")]2. Then, you can obtain the correlation of each attribute:>cor_mat = cor(new_train)3. Next, we use findCorrelation to search for highly correlated attributes with a cut off equal to 0.75:> highlyCorrelated = findCorrelation(cor_mat, cutoff=0.75)4. We then obtain the name of highly correlated attributes:> names(new_train)[highlyCorrelated][1] "total_intl_minutes" "total_day_charge" "total_eve_minutes" "total_night_minutes"

In this recipe, we search for highly correlated attributes using the caret package. In order to retrieve the correlation of each attribute, one should first remove nonnumeric attributes. Then, we perform correlation to obtain a correlation matrix. Next, we use findCorrelation to find highly correlated attributes with the cut off set to 0.75. We finally obtain the names of highly correlated

在这个食谱中,我们寻找高度相关的属性使用插入符号包。为了检索每个属性的相关性,应先去除非数值属性。然后,我们执行相关,得到相关矩阵。接下来,我们使用findcorrelation找到高度相关的属性与切断设置为0.75。我们终于获得高度相关的名称。

小结:利用交叉验证可以更好的方便我们学习和工作,大数据时代带给我们越来越多的便捷。

---------摘自百度翻译

李明玥

时间: 2024-12-24 04:56:15

文章翻译第七章4-6的相关文章

文章翻译第七章1-3

1   Estimating model performance with k-fold cross-validation 估计与交叉验证模型的性能 The k-fold cross-validation technique is a common technique used to estimate the performance of a classifier as it overcomes the problem of over-fitting. For k-fold cross?vali

文章翻译第七章7-9

7.Selecting features using the caret package 使用插入符号包装特征选择 The feature selection method searches the subset of features with minimized predictive errors. We can apply feature selection to identify which attributes are required to build an accurate mod

文章翻译第七章10-12

10  Measuring prediction performance using 测量预测性能 A receiver operating characteristic (ROC) curve is a plot that illustrates the performance of a binary classifier system, and plots the true positive rate against the false positive rate for different

文章翻译第六章1-3

Classification (II) –Neural Network and SVM 分类-神经网络与支持向量机 Introduction Most research has shown that support vector machines (SVM) and neural networks (NN) are powerful classification tools, which can be applied to several different areas. Unlike tree

[书籍翻译] 《JavaScript并发编程》第七章 抽取并发逻辑

本文是我翻译<JavaScript Concurrency>书籍的第七章 抽取并发逻辑,该书主要以Promises.Generator.Web workers等技术来讲解JavaScript并发编程方面的实践. 完整书籍翻译地址:https://github.com/yzsunlei/javascript_concurrency_translation .由于能力有限,肯定存在翻译不清楚甚至翻译错误的地方,欢迎朋友们提issue指出,感谢. 到本书这里,我们已经在代码中明确地模拟了并发问题.使

Android开发艺术探索——第七章:Android动画深入分析

Android开发艺术探索--第七章:Android动画深入分析 Android的动画可以分成三种,view动画,帧动画,还有属性动画,其实帧动画也是属于view动画的一种,,只不过他和传统的平移之类的动画不太一样的是表现形式上有点不一样,view动画是通过对场景的不断图像交换而产生的动画效果,而帧动画就是播放一大段图片,很显然,图片多了会OOM,属性动画通过动态的改变对象的属性达到动画效果,也是api11的新特性,在低版本无法使用属性动画,但是我们依旧有一些兼容库,OK,我们还是继续来看下详细

2014年计算机软考《网络管理》知识点-【第七章】

51CTO学院,在软考备考季特别整理了"2014年计算机软考<网络管理>知识点",帮助各位学院顺利过关!更多软件水平考试辅导及试题,请关注51CTO学院-软考分类吧! 查看汇总:2014年计算机软考<网络管理>知识点汇总  第七章 局域网互联 ☆局域网互连是将多个局域网相互联连接以实现信息交换和资源共享 7.1网络互连需求 7.1.1 局域网互连需求 (1) 局域网有以下三个限制因素 A. 局域网覆盖的距离是有限的; B. 局域网能支持的联网计算机的数目是有限的

第七章、特殊兴趣

目录 第七章.特殊兴趣 一.特殊兴趣的发展 二.特殊兴趣的类型 (一)收集物品 (二)积累知识和专长 三.女孩和女性的兴趣 四.特殊兴趣的功能 (一)克服焦虑感 (二)快乐的源泉 (三)放松心情的手段 (四)建立生活整体感 (五)认识自然界 (六)创造一个替代世界 (七)获得认同感 (八)消磨时间.促进交谈和展示智力 五.父母的观点 六.临床工作人员的观点 (一)减少.去除或有效利用特殊兴趣 (二)控制接触机会 (三)转换或去除不被接受的兴趣 (四)建设性的应用 七.学习什么时候可以谈论特殊兴趣

ROS机器人程序设计(原书第2版)补充资料 (柒) 第七章 3D建模与仿真 urdf Gazebo V-Rep Webots Morse

ROS机器人程序设计(原书第2版)补充资料 (柒) 第七章 3D建模与仿真 urdf Gazebo V-Rep Webots Morse 书中,大部分出现hydro的地方,直接替换为indigo或jade或kinetic,即可在对应版本中使用. 提供ROS接口的3D软件比较多,本章以最典型的Gazebo介绍为主,从Player/Stage/Gazebo发展而来,现在独立的机器人仿真开发环境,目前2016年最新版本Gazebo7.1配合ROS(kinetic)使用. 补充内容:http://blo