Programming Assignment 1: Linear Regression

Warm-up Exercise

Follow the instruction, type the code in warmUpExercise.m file:

A = eye(5);

Computing Cost (for One Variable)

By the formula for cost function (for One Variable):

J(θ₀, θ₁) = 1/(2m)*∑_i=1~m(h_θ(x⁽ⁱ⁾-y⁽ⁱ⁾)²

We can implement it in computeCost.m file by these steps:

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

predictions = X * theta; % Caculate the hypothesis/predictions vector

sqrErrors = (predictions - y).^2; % Caculate the square error for every elements in the predictions vector

J = 1/(2*m)*sum(sqrErrors); % Summarize the square error vector to get the cost function value

% =========================================================================

end

Note: Caculating the cost function is useful for plotting the figure, but it‘s not used in gradient descent because the derivative will make the square caculation become mutiply caculation.

Gradient Descent (for One Variable)

By the formula for gradient descent:

θ_j=θ_j-α*(∂/∂θ_j(J(θ₀, ... ,θ_n)))=θ_j-α*(1/m)*∑_i=1~m(h_θ(x⁽ⁱ⁾)-y⁽ⁱ⁾)*x_j⁽ⁱ⁾　　(Update θ₀ to θ_n simultaneously)

In Octave, only one line of code could accomplish the task since Octave support that v.*M which v has m elements and M has m rows. (Matlab doesn‘t support this feature):

theta = theta - alpha*sum((X*theta-y).*X, 1)‘;

In Matlab, we can implement it by thest steps, this method is not good because it doesn‘t support the case when the features n is large than 1. And also I cannot tell too much detail of it since I‘m still not good at linear algebra and Matlab (just not stop debugging and coming up with ways to construct the result I want...), need to improve these 2 subjects later.

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta.
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    prediction=X*theta;
    pre_y = prediction-y;
    pre2=[pre_y pre_y];
    theta = theta - alpha/m*sum(pre2.*X, 1)‘;

    % fprintf(‘%f %f \n‘, theta(1), theta(2));
    % ============================================================

    % Save the cost J in every iteration
    J_history(iter) = computeCost(X, y, theta);

end

end

Feature Normalization

Actually there are 3 values we need to return in this function: mean value, sigma and normalized X matrix. (Even the X_norm is the first returned value in the function, it is caculated last -_-!!!)

For mean value, it‘s easy to implement:

mu = mean(X, 1);

For sigma, it‘s also easy to implement:

sigma = std(X, 0, 1);

For normalized X matrix, it‘s very hard to get understand what the result is the one the exercise wants, I tried many possibilities but still cannot match the result to get correct response. Finally I found their test cases:

>>featureNormalize([1 2 3]‘)

ans =

-1

0

1

>>featureNormalize([1 2 3;6 4 2]‘)

ans =

-1 1

0 0

1 -1

>>featureNormalize( [ 8 1 6; 3 5 7; 4 9 2 ] )

ans =

1.1339 -1.0000 0.3780

-0.7559 0 0.7559

-0.3780 1.0000 -1.1339

>>featureNormalize([1 2 3 1;6 4 2 0;11 3 3 9;4 9 8 8]‘)

ans =

-0.78335 1.16190 1.09141 -1.46571

0.26112 0.38730 -0.84887 0.78923

1.30558 -0.38730 -0.84887 0.33824

-0.78335 -1.16190 0.60634 0.33824

And try in Matlab:

X_norm = [X(:,1)/sigma(1,1) X(:,2)/sigma(1,2)];

It returns:

>> featureNormalize([1 2 3]‘)
Attempted to access X(:,2); index out of bounds because size(X)=[3,1].

Error in featureNormalize (line 39)
X_norm = [X(:,1)/sigma(1,1) X(:,2)/sigma(1,2)];

It looks like the former function I composed only support 2 columns matrix operation. This should be supported in mutiple variables cases. So I come up with a way which import the amount of features n:

n = size(sigma,2);
for i=1:n,
     X_norm(:,n) = X(:,n)/sigma(1,n);
end

It returns:

>> featureNormalize([1 2 3]‘)

ans =

     1
     2
     3

>> featureNormalize([1 2 3;6 4 2]‘)

ans =

     1     3
     2     2
     3     1

It seems the result is very closed to the test case result? What is missing? The mean value!!!
Add it immediately with great hope:

n = size(sigma,2);
for i=1:n,
     X_norm(:,n) = (X(:,n)-mu(:,n))/sigma(1,n);
end

Check the result:

>> featureNormalize([1 2 3]‘)

ans =

    -1
     0
     1

>> featureNormalize([1 2 3;6 4 2]‘)

ans =

     1     1
     2     0
     3    -1

>> featureNormalize( [ 8 1 6; 3 5 7; 4 9 2 ] )

ans =

    8.0000    1.0000    0.3780
    3.0000    5.0000    0.7559
    4.0000    9.0000   -1.1339

Why is the only the last correct? The misuse of i and n..., finally we have:

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.

% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the
%               standard deviation of each feature and divide
%               each feature by it‘s standard deviation, storing
%               the standard deviation in sigma.
%
%               Note that X is a matrix where each column is a
%               feature and each row is an example. You need
%               to perform the normalization separately for
%               each feature.
%
% Hint: You might find the ‘mean‘ and ‘std‘ functions useful.
%       

mu = mean(X, 1);

sigma = std(X, 0, 1);

n = size(sigma,2);

for i=1:n,
     X_norm(:,i) = (X(:,i)-mu(:,i))/sigma(1,i);
end

% ============================================================

end

All returned results are same as test cases! Perfect!!!

Computing Cost (for Multiple Variables)

Actually it the same as the cost function in one variable. Omitted...

时间： 2024-10-04 17:53:50

Programming Assignment 1: Linear Regression

Warm-up Exercise

Computing Cost (for One Variable)

Gradient Descent (for One Variable)

Feature Normalization

Computing Cost (for Multiple Variables)

Programming Assignment 1: Linear Regression的相关文章

斯坦福大学机器学习公开课---Programming Exercise 1: Linear Regression

Algorithm Part I:Programming Assignment(2)

Andrew Ng Machine Learning 专题【Linear Regression】

Spark MLlib Linear Regression线性回归算法

Machine learning with python - Linear Regression

rhadoop linear regression 问题

Regularization in Linear Regression & Logistic Regression

Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression

Machine_learning_cs229线性回归 Linear regression(2)