Linear Regression ----- Stanford Machine Learning(by Andrew NG)Course Notes

Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml

在Linear Regression部分出现了一些新的名词,这些名词在后续课程中会频繁出现:
















Cost Function Linear Regression Gradient Descent Normal Equation Feature Scaling Mean normalization
损失函数 线性回归 梯度下降 正规方程 特征归一化 均值标准化


Model
Representation

  m: number of
training examples

  x(i):
input (features) of ith training example

  xj(i):
value of feature j in ith training
example

  y(i):
“output” variable / “target” variable of ith training
example

  n:
number of features

  θ:
parameters

  Hypothesis:
hθ(x) = θ0 +
θ1x1 +
θ2x2 + …
+θnxn


Cost
Function

 
IDEA:
Choose θso that hθ(x) is close to y for our training
examples (x, y).

A.Linear Regression with One Variable Cost
Function

Cost Function:    

Goal:    

Contour Plot:

B.Linear
Regression with Multiple Variable Cost Function

Cost Function:  

Goal:   


Gradient
Descent 

Outline

Gradient
Descent Algorithm

迭代过程收敛图可能如下:

(此为等高线图,中间为最小值点,图中蓝色弧线为可能的收敛路径。)

Learning
Rate α:

1) If α is too small, gradient descent can be slow to
converge;

2) If α is too large, gradient descent may not decrease on every
iteration or may not converge;

3) For sufficiently small α , J(θ) should decrease on
every iteration;

Choose
Learning Rate α:
Debug, 0.001, 0.003, 0.006,
0.01, 0.03, 0.06, 0.1, 0.3, 0.6,
1.0;

“Batch”
Gradient Descent:
Each step of gradient descent uses all the training
examples;

“Stochastic” gradient descent:
Each step of gradient descent uses only one training
examples.


Normal Equation

IDEA:
Method to solve for θ analytically.

 for
every j, then   

Restriction:
Normal Equation does not work when (XTX) is
non-invertible.

PS:
当矩阵为满秩矩阵时,该矩阵可逆。列向量(feature)线性无关且行向量(样本)线性无关的个数大于列向量的个数(特征个数n).


Gradient
Descent Algorithm VS. Normal Equation 

Gradient
Descent:

a) Need to choose α;

 
        
b) Needs many iterations;

c) Works well even when n is large;
(n > 1000 is appropriate)

Normal
Equation:

a) No need to choose α;

b) Don’t need to iterate;

c) Need to compute ;

d) Slow if n is very large. (n
< 1000 is OK)


Feature Scaling

IDEA:
Make sure features are on a similar scale.

好处:
减少迭代次数,有利于快速收敛

Example:
If we need to get every feature into approximately a -1
≤ xi ≤ 1 range, feature values located in [-3,
3] or [-1/3, 1/3] fields are acceptable.

Mean
normalization:  


HOMEWORK

好了,既然看完了视频课程,就来做一下作业吧,下面是Linear Regression部分作业的核心代码:

1.computeCost.m/computeCostMulti.m

J=1/(2*m)*sum((theta‘*X‘-y‘).^2);

2.gradientDescent.m/gradientDescentMulti.m

h=X*theta-y;
v=X‘*h;
v=v*alpha/m;
theta1=theta;
theta=theta-v;

Linear Regression ----- Stanford Machine Learning(by Andrew
NG)Course Notes,码迷,mamicode.com

Linear Regression ----- Stanford Machine Learning(by Andrew
NG)Course Notes

时间: 2024-12-11 13:47:57

Linear Regression ----- Stanford Machine Learning(by Andrew NG)Course Notes的相关文章

Logistic Regression &amp; Regularization ----- Stanford Machine Learning(by Andrew NG)Course Notes

coursera上面Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml 我曾经使用Logistic Regression方法进行ctr的预测工作,因为当时主要使用的是成型的工具,对该算法本身并没有什么比较深入的认识,不过可以客观的感受到Logistic Regression的商用价值. Logistic Regression Model A. objective function       其中z的定义域是(-I

Introduction ----- Stanford Machine Learning(by Andrew NG)Course Notes

最近学习了coursera上面Andrew NG的Machine learning课程,课程地址为:https://www.coursera.org/course/ml 在Introduction部分NG较为系统的概括了Machine learning的一些基本概念,也让我接触了一些新的名词,这些名词在后续课程中会频繁出现: Machine Learning Supervised Learning Unsupervised Learning Regression Problem Classifi

Neural Networks Representation ----- Stanford Machine Learning(by Andrew NG)Course Notes

Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml 神经网络一直被认为是比较难懂的问题,NG将神经网络部分的课程分为了两个星期来介绍,可见Neural Networks内容之多.言归正传,通过之前的学习我们知道,使用非线性的多项式能够帮助我们建立更好的分类模型.但当遇特征非常多的时候,需要训练的参数太多,使得训练非常复杂,使得逻辑回归有心无力. 例如我们有100个特征,如果用这100个特征来构建一个非线性的多项式模

Neural Networks Learning----- Stanford Machine Learning(by Andrew NG)Course Notes

本栏目内容来自Andrew NG老师的公开课:https://class.coursera.org/ml/class/index 一般而言, 人工神经网络与经典计算方法相比并非优越, 只有当常规方法解决不了或效果不佳时人工神经网络方法才能显示出其优越性.尤其对问题的机理不甚了解或不能用数学模型表示的系统,如故障诊断.特征提取和预测等问题,人工神经网络往往是最有利的工具.另一方面, 人工神经网络对处理大量原始数据而不能用规则或公式描述的问题, 表现出极大的灵活性和自适应性. 神经网络模型解决问题的

Advice for Applying Machine Learning &amp; Machine Learning System Design----- Stanford Machine Learning(by Andrew NG)Course Notes

Adviceforapplyingmachinelearning Deciding what to try next 现在我们已学习了线性回归.逻辑回归.神经网络等机器学习算法,接下来我们要做的是高效地利用这些算法去解决实际问题,尽量不要把时间浪费在没有多大意义的尝试上,Advice for applying machine learning & Machinelearning system design 这两课介绍的就是在设计机器学习系统的时候,我们该怎么做? 假设我们实现了一个正则化的线性回

机器学习 Machine Learning(by Andrew Ng)----第二章 单变量线性回归(Linear Regression with One Variable)

第二章 单变量线性回归(Linear Regression with One Variable) <模型表示(Model Representation)>                                                             <代价函数(Cost Function)>                                                          <梯度下降(Gradient Descent)

Optimization and Machine Learning(优化与机器学习)

这是根据(ShanghaiTech University)王浩老师的授课所作的整理. 需要的预备知识:数分.高代.统计.优化 machine learning:(Tom M. Mitchell) "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T,

(原创)Stanford Machine Learning (by Andrew NG) --- (week 7) Support Vector Machines

本栏目内容来源于Andrew NG老师讲解的SVM部分,包括SVM的优化目标.最大判定边界.核函数.SVM使用方法.多分类问题等,Machine learning课程地址为:https://www.coursera.org/course/ml 大家对于支持向量机(SVM)可能会比较熟悉,是个强大且流行的算法,有时能解决一些复杂的非线性问题.我之前用过它的工具包libsvm来做情感分析的研究,感觉效果还不错.NG在进行SVM的讲解时也同样建议我们使用此类的工具来运用SVM. (一)优化目标(Opt

(原创)Stanford Machine Learning (by Andrew NG) --- (week 10) Large Scale Machine Learning &amp; Application Example

本栏目来源于Andrew NG老师讲解的Machine Learning课程,主要介绍大规模机器学习以及其应用.包括随机梯度下降法.维批量梯度下降法.梯度下降法的收敛.在线学习.map reduce以及应用实例:photo OCR.课程地址为:https://www.coursera.org/course/ml (一)大规模机器学习 从前面的课程我们知道,如果我们的系统是high variance的,那么增加样本数会改善我们的系统,假设现在我们有100万个训练样本,可想而知,如果使用梯度下降法,