Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml
在Linear Regression部分出现了一些新的名词,这些名词在后续课程中会频繁出现:
Cost Function | Linear Regression | Gradient Descent | Normal Equation | Feature Scaling | Mean normalization |
损失函数 | 线性回归 | 梯度下降 | 正规方程 | 特征归一化 | 均值标准化 |
Model
Representation
m: number of
training examples
x(i):
input (features) of ith training example
xj(i):
value of feature j in ith training
example
y(i):
“output” variable / “target” variable of ith training
example
n:
number of features
θ:
parameters
Hypothesis:
hθ(x) = θ0 +
θ1x1 +
θ2x2 + …
+θnxn
Cost
Function
IDEA: Choose θso that hθ(x) is close to y for our training
examples (x, y).
A.Linear Regression with One Variable Cost
Function
Cost Function:
Goal:
Contour Plot:
B.Linear
Regression with Multiple Variable Cost Function
Cost Function:
Goal:
Gradient
Descent
Outline
Gradient
Descent Algorithm
迭代过程收敛图可能如下:
(此为等高线图,中间为最小值点,图中蓝色弧线为可能的收敛路径。)
Learning
Rate α:
1) If α is too small, gradient descent can be slow to
converge;
2) If α is too large, gradient descent may not decrease on every
iteration or may not converge;
3) For sufficiently small α , J(θ) should decrease on
every iteration;
Choose
Learning Rate α: Debug, 0.001, 0.003, 0.006,
0.01, 0.03, 0.06, 0.1, 0.3, 0.6,
1.0;
“Batch”
Gradient Descent: Each step of gradient descent uses all the training
examples;
“Stochastic” gradient descent:
Each step of gradient descent uses only one training
examples.
Normal Equation
IDEA:
Method to solve for θ analytically.
for
every j, then
Restriction:
Normal Equation does not work when (XTX) is
non-invertible.
PS:
当矩阵为满秩矩阵时,该矩阵可逆。列向量(feature)线性无关且行向量(样本)线性无关的个数大于列向量的个数(特征个数n).
Gradient
Descent Algorithm VS. Normal Equation
Gradient
Descent:
a) Need to choose α;
b) Needs many iterations;
c) Works well even when n is large;
(n > 1000 is appropriate)
Normal
Equation:
a) No need to choose α;
b) Don’t need to iterate;
c) Need to compute ;
d) Slow if n is very large. (n
< 1000 is OK)
Feature Scaling
IDEA:
Make sure features are on a similar scale.
好处:
减少迭代次数,有利于快速收敛
Example:
If we need to get every feature into approximately a -1
≤ xi ≤ 1 range, feature values located in [-3,
3] or [-1/3, 1/3] fields are acceptable.
Mean
normalization:
HOMEWORK
好了,既然看完了视频课程,就来做一下作业吧,下面是Linear Regression部分作业的核心代码:
1.computeCost.m/computeCostMulti.m
J=1/(2*m)*sum((theta‘*X‘-y‘).^2);
2.gradientDescent.m/gradientDescentMulti.m
h=X*theta-y;
v=X‘*h;
v=v*alpha/m;
theta1=theta;
theta=theta-v;
Linear Regression ----- Stanford Machine Learning(by Andrew
NG)Course Notes,码迷,mamicode.com
Linear Regression ----- Stanford Machine Learning(by Andrew
NG)Course Notes