机器学习---吴恩达---Week6_1（机器学习改进方法）

应用机器学习

Machine Learning Diagnostics（机器学习诊断）

Diagnostic is a test you can run, to get insight into what is or isn‘t working with an algorithm, and which will often give you insight as to what are promising things to try to improve a learning algorithm‘s performance. And

diagnostics can take

time to implement and can sometimes,

take quite a lot of time to implement and understand but doing so can be a very good use of your time when you are developing learning algorithms because they can often save you from spending many months pursuing an avenue that you could have found out much earlier just was not going to be fruitful.

翻译：诊断是一项可以运行的测试，可以深入了解使用算法做什么或不做什么，并且通常可以让您深入了解尝试提高学习算法性能的方法。诊断可能需要一段时间代价实现，即有时可能会花费大量时间运行，但这样做仍是可以很好地利用开发学习算法的时间，因为它们通常可以帮助避免花费数月时间来尝试一条并不会有多大成效的方法。

Evaluating a Hypothesis（评估预测函数）

回归分析出现问题时的改进思路：

Getting more training examples（增加训练样本）
Trying smaller sets of features（减少训练特征）
Trying additional features（增加新的特征）
Trying polynomial features（使用多项式回归）
Increasing or decreasing λ（增大或减小λ）

Evaluating a Hypothesis

Given a dataset of training examples, we can split up the data into two sets: a training set and a test set. Typically, the training set consists of 70 % of your data and the test set is the remaining 30 %.（将训练集分成两部分，70%用作训练，30%用作测试）

训练过程

方法一：

Learn Θ and minimize Jtrain(Θ) using the training set（使用训练数据获得损失函数）
Compute the test set error Jtest(Θ)（计算测试数据部分的损失函数）

方法二：

Learn Θ and minimize Jtrain(Θ) using the training set（使用训练数据获得损失函数）
Compute theMisclassification error (aka 0/1 misclassification error):（计算分类错误率）---适用于多类别分类

Model Selection Problem（模型选择问题）

Given many models with different polynomial degrees, we can use a systematic approach to identify the ‘best‘ function.（从多个不同阶的多项式中选出预测性最好的多项式回归模型）

Train/Validation/Test Sets

将训练数据分成三个不同的数据集，分别为训练集、交叉验证集与测试集

Training set: 60%
Cross validation set: 20%
Test set: 20%

模型选择过程

Optimize the parameters in Θ using the training set for each polynomial degree.（使用训练集得到不同模型的参数Θ ）
Find the polynomial degree d with the least error using the cross validation set.（使用验证集获得各个多项式模型的误差损失函数）
Estimate the generalization error using the test set with Jtest(Θ(d)), (d = theta from polynomial with lower error);（对误差小的模型使用测试集获得其误差损失值）

Diagnosing Bias vs. Variance

We need to distinguish whether bias or variance is the problem contributing to bad predictions.（偏差与方差）
High bias is underfitting and high variance is overfitting. Ideally, we need to find a golden mean between these two.（欠拟合与过拟合）

The training error will tend to decrease as we increase the degree d of the polynomial. At the same time, the cross validation error will tend to decrease as we increase d up to a point, and then it will increase as d is increased, forming a convex curve.（训练误差不断减小，验证与测试误差减小到某点后剧烈增大）

High bias (underfitting): both Jtrain(Θ) and JCV(Θ) will be high. Also, JCV(Θ)≈Jtrain(Θ).

High variance (overfitting): Jtrain(Θ) will be low and JCV(Θ) will be much greater than Jtrain(Θ).

Regularization and Bias/Variance（正则化与偏差方差）

问题：

In the figure above, we see that as λ increases, our fit becomes more rigid. On the other hand, as λ approaches 0, we tend to over overfit the data.（对于过拟合模型，λ偏小表现过拟合，λ 偏大，表现欠拟合）

选择合适的λ的方法：

Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10.24});（使用一定区间范围的λ ）
Create a set of models with different degrees or any other variants.（使用不同λ 的模型）
Iterate through the λs and for each λ go through all the models to learn some Θ.λ（获得对应模型的参数值）
Compute the cross validation error using the learned Θ (computed with λ) on the JCV(Θ) without regularization or λ = 0.（使用得到的参数值和λ 计算验证误差）
Select the best combo that produces the lowest error on the cross validation set.（选择最好具有最小验证误差的λ ）
Using the best combo Θ and λ, apply it on Jtest(Θ) to see if it has a good generalization of the problem.（使用上述最优模型的参数值与λ 进行测试观察是否能够预测问题）

Learning Curves（学习曲线）

Experiencing high bias:

Low training set size: causes Jtrain(Θ) to be low and JCV(Θ) to be high.（小训练数据：训练集误差小，验证集误差大）

Large training set size: causes both Jtrain(Θ) and JCV(Θ) to be high with Jtrain(Θ)≈JCV(Θ).（大训练数据：训练集误差增大，验证误差减小，二者同一水平）

If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.高偏差情况，增大训练数据不能有效改善线训练结果）

Experiencing high variance:

Low training set size: Jtrain(Θ) will be low and JCV(Θ) will be high.（小训练数据：训练误差小，验证误差大）

Large training set size: Jtrain(Θ) increases with training set size and JCV(Θ) continues to decrease without leveling off. Also, Jtrain(Θ) < JCV(Θ) but the difference between them remains significant.（大训练数据：训练集误差增大，验证误差减小，二者具有一个明显gap）

If a learning algorithm is suffering from high variance, getting more training data is likely to help..高方差情况，增大训练数据可以有效改善线训练结果）

不同问题的针对解决办法

Getting more training examples: Fixes high variance（增加训练数据---解决高方差）

Trying smaller sets of features: Fixes high variance（减少训练特征---解决高方差）

Adding features: Fixes high bias（增加训练特征---解决高偏差）

Adding polynomial features: Fixes high bias（增加多项式拟合---解决高偏差）

Decreasing λ: Fixes high bias（增大λ---解决高偏差）

Increasing λ: Fixes high variance.（减小λ---解决高方差）

Diagnosing Neural Networks（神经网络诊断）

A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.（小型神经网络容易欠拟合，计算量小）
A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.（大型神经网络容易过拟合，增大λ可以解决此问题，不过计算量大）

Using a single hidden layer is a good starting default. You can train your neural network on a number of hidden layers using your cross validation set. You can then select the one that performs best.（默认使用一个隐藏层是好的开始，可以使用通过不同的验证集选择合适的隐藏层层数）

Model Complexity Effects:（模型复杂度影响）

Lower-order polynomials (low model complexity) have high bias and low variance. In this case, the model fits poorly consistently.（低阶多项式高偏差低方程，拟合性较差）
Higher-order polynomials (high model complexity) fit the training data extremely well and the test data extremely poorly. These have low bias on the training data, but very high variance.（高阶多项式拟合性较好，一般具有低偏差高方差的特点）
In reality, we would want to choose a model somewhere in between, that can generalize well but also fits the data reasonably well.（实际应用应选择拟合性和的同时预测性强的多项式模型）

原文地址：https://www.cnblogs.com/zouhq/p/10677620.html

时间： 2024-10-09 23:29:41

机器学习---吴恩达---Week6_1（机器学习改进方法）的相关文章

吴恩达Coursera机器学习

涉及 Logistic 回归.正则化. 六.逻辑回归(Logistic Regression) 6.1 分类问题 6.2 假说表示 6.3 判定边界 6.4 代价函数 6.5 简化的成本函数和梯度下降 6.6 高级优化 6.7 多类别分类:一对多七.正则化(Regularization) 7.1 过拟合的问题 7.2 代价函数 7.3 正则化线性回归 7.4 正则化的逻辑回归模型六.逻辑回归(Logistic Regression) 6.1 分类问题参考文档: 6 - 1 - Classi

吴恩达2014机器学习教程笔记目录

17年开始,网上的机器学习教程逐渐增多,国内我所了解的就有网易云课堂.七月.小象学院和北风.他们的课程侧重点各有不同,有些侧重理论,有些侧重实践,结合起来学习事半功倍.但是论经典,还是首推吴恩达的机器学习课程. 吴大大14年在coursera的课程通俗易懂.短小精悍,在讲解知识点的同时,还会穿插相关领域的最新动态,并向你推荐相关论文.课程10周共18节课,每个课程都有PPT和课后习题,当然,也有中文字幕. 百度网盘(视频 + 英文字幕 + 中文字幕 + 练习 + PPT): 链接:https:/

吴恩达《机器学习》课程总结(5)_logistic回归

Q1分类问题回归问题的输出可能是很大的数,而在分类问题中,比如二分类,希望输出的值是0或1,如何将回归输出的值转换成分类的输出0,1成为关键.注意logistics回归又称逻辑回归,但他是分类问题,而不是回归问题. Q2假说表示其中: sigmoid函数 hθ(x)的作用是,对于给定的输入变量,根据选择的参数计算输出变量=1的可能性即hθ(x)=P(y=1|x;θ). Q3判定边界 g(z)中的z即为判定边界,如下 Q4代价函数如果用之前回归时用的平方损失函数,代价函数将是非凸函数,会收

斯坦福吴恩达教授机器学习公开课第四讲笔记——牛顿方法/广义线性模型

机器学习---吴恩达---Week6_2（机器学习系统设计）

Machine Learing System Design(机器学习系统设计) Ways to improve the accuracy of a classifier(提高分类器准确性的几个方法) Collect lots of data (for example "honeypot" project but doesn't always work)(收集大量数据,并不总是有用) Develop sophisticated features (for example: using e

吴恩达《机器学习》课程总结(19)_总结

(1)涉及到的算法 1.监督学习:线性回归,逻辑回归,神经网络,SVM. 线性回归(下面第三行x0(i)其实是1,可以去掉) 逻辑回归神经网络(写出前向传播即可,反向框架会自动计算) SVM 2.非监督学习:聚类算法(K-mean),降维(PCA) K-mean PCA 3.异常检测 4.推荐系统 (2)策略 1.偏差与方差,正则化训练误差减去人类最高水平为偏差(欠拟合),交叉验证集误差减训练误差为方差(过拟合): 正则化解决方差问题,不对θ0正则化: 2.学习曲线全过程观测偏差与方差,所

吴恩达《机器学习》课程总结（1６）推荐系统

16.1问题形式化 (1)讲推荐系统的原因主要有以下几点: 1.推荐系统是一个很重要的机器学习的应用,虽然在学术界上占比较低,但是在商业应用中非常的重要,占有很高的优先级. 2.传达机器学习的一个大思想:特性是可以学习而来的,不需要人工去选择. (2)说明的案例:电影推荐系统希望创建一个算法来预测每个人可能会给他们没看过的电影打多少分,并以此作为推荐依据. (3)此外引入一些标记: nu代表用户的数量, nm代表电影的数量, r(i,j)如果用户j给电影i评过分则r(i,j)=1, y(i,j

吴恩达《机器学习》章节1绪论：初识机器学习

1.欢迎参加<机器学习> 2.什么是机器学习? 机器学习(Machine Learning, ML)是一门多领域交叉学科,涉及概率论.统计学.逼近论.凸分析.算法复杂度理论等多门学科.专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能. 它是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域,它主要使用归纳.综合而不是演绎. 监督学习和无监督学的的区别为是否需要人工参与数据结果的标注. 3.监督学习(Super

吴恩达《机器学习》课程总结（4）多变量线性回归

4.1多维特征上图中列数即为特征的个数,行数是样本数.函数假设如下: 其中x0=1. 4.2多变量梯度下降和单变量的损失函数相同: 其中, 求导迭代如下: 4.3梯度下降法实践1-特征缩放特征之间的尺度变化相差很大(如一个是0-1000,一个是0-5),梯度算法需要非常多次的迭代才能收敛,如下图所示: 方法:将各个特征缩放至大致相同的尺度,最简单的方法就是特征减去均值除以方差.如下所示: 4.4梯度下降法实践2-学习率学习率过小收敛慢,学习率过大可能导致无法收敛. 通常通过三倍放大来考虑