Programming Assignment 1: Linear Regression

Warm-up Exercise

Follow the instruction, type the code in warmUpExercise.m file:

A = eye(5);

Computing Cost (for One Variable)

By the formula for cost function (for One Variable):

J(θ0, θ1) = 1/(2m)*∑i=1~m(hθ(x(i)-y(i))2

We can implement it in computeCost.m file by these steps:

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

predictions = X * theta; % Caculate the hypothesis/predictions vector

sqrErrors = (predictions - y).^2; % Caculate the square error for every elements in the predictions vector

J = 1/(2*m)*sum(sqrErrors); % Summarize the square error vector to get the cost function value

% =========================================================================

end

Note: Caculating the cost function is useful for plotting the figure, but it‘s not used in gradient descent because the derivative will make the square caculation become mutiply caculation.

Gradient Descent (for One Variable)

By the formula for gradient descent:

θjj-α*(∂/∂θj(J(θ0, ... ,θn)))=θj-α*(1/m)*∑i=1~m(hθ(x(i))-y(i))*xj(i)  (Update θ0 to θn simultaneously)

In Octave, only one line of code could accomplish the task since Octave support that v.*M which v has m elements and M has m rows. (Matlab doesn‘t support this feature):

theta = theta - alpha*sum((X*theta-y).*X, 1)‘;

In Matlab, we can implement it by thest steps, this method is not good because it doesn‘t support the case when the features n is large than 1. And also I cannot tell too much detail of it since I‘m still not good at linear algebra and Matlab (just not stop debugging and coming up with ways to construct the result I want...), need to improve these 2 subjects later.

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta.
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    prediction=X*theta;
    pre_y = prediction-y;
    pre2=[pre_y pre_y];
    theta = theta - alpha/m*sum(pre2.*X, 1)‘;

    % fprintf(‘%f %f \n‘, theta(1), theta(2));
    % ============================================================

    % Save the cost J in every iteration
    J_history(iter) = computeCost(X, y, theta);

end

end

Feature Normalization

Actually there are 3 values we need to return in this function: mean value, sigma and normalized X matrix. (Even the X_norm is the first returned value in the function, it is caculated last -_-!!!)

For mean value, it‘s easy to implement:

mu = mean(X, 1);

For sigma, it‘s also easy to implement:

sigma = std(X, 0, 1);

For normalized X matrix, it‘s very hard to get understand what the result is the one the exercise wants, I tried many possibilities but still cannot match the result to get correct response. Finally I found their test cases:

>>featureNormalize([1 2 3]‘)

ans =

-1

0

1

>>featureNormalize([1 2 3;6 4 2]‘)

ans =

-1 1

0 0

1 -1

>>featureNormalize( [ 8 1 6; 3 5 7; 4 9 2 ] )

ans =

1.1339 -1.0000 0.3780

-0.7559 0 0.7559

-0.3780 1.0000 -1.1339

>>featureNormalize([1 2 3 1;6 4 2 0;11 3 3 9;4 9 8 8]‘)

ans =

-0.78335 1.16190 1.09141 -1.46571

0.26112 0.38730 -0.84887 0.78923

1.30558 -0.38730 -0.84887 0.33824

-0.78335 -1.16190 0.60634 0.33824

And try in Matlab:

X_norm = [X(:,1)/sigma(1,1) X(:,2)/sigma(1,2)];

It returns:

>> featureNormalize([1 2 3]‘)
Attempted to access X(:,2); index out of bounds because size(X)=[3,1].

Error in featureNormalize (line 39)
X_norm = [X(:,1)/sigma(1,1) X(:,2)/sigma(1,2)];

It looks like the former function I composed only support 2 columns matrix operation. This should be supported in mutiple variables cases. So I come up with a way which import the amount of features n:

n = size(sigma,2);
for i=1:n,
     X_norm(:,n) = X(:,n)/sigma(1,n);
end

It returns:

>> featureNormalize([1 2 3]‘)

ans =

     1
     2
     3

>> featureNormalize([1 2 3;6 4 2]‘)

ans =

     1     3
     2     2
     3     1

It seems the result is very closed to the test case result? What is missing? The mean value!!!
Add it immediately with great hope:

n = size(sigma,2);
for i=1:n,
     X_norm(:,n) = (X(:,n)-mu(:,n))/sigma(1,n);
end

Check the result:

>> featureNormalize([1 2 3]‘)

ans =

    -1
     0
     1

>> featureNormalize([1 2 3;6 4 2]‘)

ans =

     1     1
     2     0
     3    -1

>> featureNormalize( [ 8 1 6; 3 5 7; 4 9 2 ] )

ans =

    8.0000    1.0000    0.3780
    3.0000    5.0000    0.7559
    4.0000    9.0000   -1.1339

Why is the only the last correct? The misuse of i and n..., finally we have:

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.

% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the
%               standard deviation of each feature and divide
%               each feature by it‘s standard deviation, storing
%               the standard deviation in sigma.
%
%               Note that X is a matrix where each column is a
%               feature and each row is an example. You need
%               to perform the normalization separately for
%               each feature.
%
% Hint: You might find the ‘mean‘ and ‘std‘ functions useful.
%       

mu = mean(X, 1);

sigma = std(X, 0, 1);

n = size(sigma,2);

for i=1:n,
     X_norm(:,i) = (X(:,i)-mu(:,i))/sigma(1,i);
end

% ============================================================

end

All returned results are same as test cases! Perfect!!!

Computing Cost (for Multiple Variables)

Actually it the same as the cost function in one variable. Omitted...

时间: 2024-10-04 17:53:50

Programming Assignment 1: Linear Regression的相关文章

斯坦福大学机器学习公开课---Programming Exercise 1: Linear Regression

斯坦福大学机器学习公开课---Programming Exercise 1: Linear Regression 1  Linear regression with one variable In thispart of this exercise, you will implement linear regression with one variableto predict profits for a food truck. Suppose you are the CEO of a rest

Algorithm Part I:Programming Assignment(2)

问题描述: Programming Assignment 2: Randomized Queues and Deques Write a generic data type for a deque and a randomized queue. The goal of this assignment is to implement elementary data structures using arrays and linked lists, and to introduce you to g

Andrew Ng Machine Learning 专题【Linear Regression】

此文是斯坦福大学,机器学习界 superstar - Andrew Ng 所开设的 Coursera 课程:Machine Learning 的课程笔记. 力求简洁,仅代表本人观点,不足之处希望大家探讨. 课程网址:https://www.coursera.org/learn/machine-learning/home/welcome Week 3: Logistic Regression & Regularization 笔记:http://blog.csdn.net/ironyoung/ar

Spark MLlib Linear Regression线性回归算法

1.Spark MLlib Linear Regression线性回归算法 1.1 线性回归算法 1.1.1 基础理论 在统计学中,线性回归(Linear Regression)是利用称为线性回归方程的最小平方函数对一个或多个自变量和因变量之间关系进行建模的一种回归分析.这种函数是一个或多个称为回归系数的模型参数的线性组合. 回归分析中,只包括一个自变量和一个因变量,且二者的关系可用一条直线近似表示,这种回归分析称为一元线性回归分析.如果回归分析中包括两个或两个以上的自变量,且因变量和自变量之间

Machine learning with python - Linear Regression

Machine learning with python Linear Regression 数据来自 cs229  Problem Set 1 (pdf) Data: q1x.dat, q1y.dat, q2x.dat, q2y.dat PS1 Solution (pdf) 从左上往右下 batchGradientDescent的cost随迭代次数的增加而下降,和收敛结果 stochasticGradientDescent的cost随迭代次数的增加而下降,和收敛结果 normalEquatio

rhadoop linear regression 问题

library(rhdfs) library(rmr2) hdfs.init() hdfs.delete("/user/output/lm.output") map <- function(k,lines) { lines<-unlist(strsplit(lines,'#')) k<-lines[1] x<-unlist(strsplit(lines[2],',')) y<-unlist(strsplit(lines[3],',')) x<-as.

Regularization in Linear Regression & Logistic Regression

一.正则化应用于基于梯度下降的线性回归 上一篇文章我们说过,通过正则化的思想,我们将代价函数附加了一个惩罚项,变成如下的公式: 那么我们将这一公式套用到线性回归的代价函数中去.我们说过,一般而言θ0我们不做处理,所以我们把梯度下降计算代价函数最优解的过程转化为如下两个公式. 我们通过j>0的式子,能够分析得出,θj 我们可以提取公因子,即将上式变成: 由于θj的系数小于1,可以看出, 正则化线性回归的梯度下降算法的变化在于,每次都在原有算法更新规则的 基础上令 θ 值减少了一个额外的值. 那么至

Matlab实现线性回归和逻辑回归: Linear Regression &amp; Logistic Regression

原文:http://blog.csdn.net/abcjennifer/article/details/7732417 本文为Maching Learning 栏目补充内容,为上几章中所提到单参数线性回归.多参数线性回归和 逻辑回归的总结版.旨在帮助大家更好地理解回归,所以我在Matlab中分别对他们予以实现,在本文中由易到难地逐个介绍. 本讲内容: Matlab 实现各种回归函数 ========================= 基本模型 Y=θ0+θ1X1型---线性回归(直线拟合) 解决

Machine_learning_cs229线性回归 Linear regression(2)

这篇博客针对的AndrewNg在公开课中未讲到的,线性回归梯度下降的学习率进行讨论,并且结合例子讨论梯度下降初值的问题. 线性回归梯度下降中的学习率 上一篇博客中我们推导了线性回归,并且用梯度下降来求解线性回归中的参数.但是我们并没有考虑到学习率的问题. 我们还是沿用之前对于线性回归形象的理解:你站在山顶,环顾四周,寻找一个下山最快的方向走一小步,然后再次环顾四周寻找一个下山最快的方向走一小步,在多次迭代之后就会走到最低点.那么在这个理解中,学习率其实是什么呢?学习率就是你走的步子有多长. 所以