Warm-up Exercise
Follow the instruction, type the code in warmUpExercise.m file:
A = eye(5);
Computing Cost (for One Variable)
By the formula for cost function (for One Variable):
J(θ0, θ1) = 1/(2m)*∑i=1~m(hθ(x(i)-y(i))2
We can implement it in computeCost.m file by these steps:
function J = computeCost(X, y, theta) %COMPUTECOST Compute cost for linear regression % J = COMPUTECOST(X, y, theta) computes the cost of using theta as the % parameter for linear regression to fit the data points in X and y % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta % You should set J to the cost. predictions = X * theta; % Caculate the hypothesis/predictions vector sqrErrors = (predictions - y).^2; % Caculate the square error for every elements in the predictions vector J = 1/(2*m)*sum(sqrErrors); % Summarize the square error vector to get the cost function value % ========================================================================= end
Note: Caculating the cost function is useful for plotting the figure, but it‘s not used in gradient descent because the derivative will make the square caculation become mutiply caculation.
Gradient Descent (for One Variable)
By the formula for gradient descent:
θj=θj-α*(∂/∂θj(J(θ0, ... ,θn)))=θj-α*(1/m)*∑i=1~m(hθ(x(i))-y(i))*xj(i) (Update θ0 to θn simultaneously)
In Octave, only one line of code could accomplish the task since Octave support that v.*M which v has m elements and M has m rows. (Matlab doesn‘t support this feature):
theta = theta - alpha*sum((X*theta-y).*X, 1)‘;
In Matlab, we can implement it by thest steps, this method is not good because it doesn‘t support the case when the features n is large than 1. And also I cannot tell too much detail of it since I‘m still not good at linear algebra and Matlab (just not stop debugging and coming up with ways to construct the result I want...), need to improve these 2 subjects later.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha % Initialize some useful values m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters % ====================== YOUR CODE HERE ====================== % Instructions: Perform a single gradient step on the parameter vector % theta. % % Hint: While debugging, it can be useful to print out the values % of the cost function (computeCost) and gradient here. % prediction=X*theta; pre_y = prediction-y; pre2=[pre_y pre_y]; theta = theta - alpha/m*sum(pre2.*X, 1)‘; % fprintf(‘%f %f \n‘, theta(1), theta(2)); % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end
Feature Normalization
Actually there are 3 values we need to return in this function: mean value, sigma and normalized X matrix. (Even the X_norm is the first returned value in the function, it is caculated last -_-!!!)
For mean value, it‘s easy to implement:
mu = mean(X, 1);
For sigma, it‘s also easy to implement:
sigma = std(X, 0, 1);
For normalized X matrix, it‘s very hard to get understand what the result is the one the exercise wants, I tried many possibilities but still cannot match the result to get correct response. Finally I found their test cases:
>>featureNormalize([1 2 3]‘) ans = -1 0 1 >>featureNormalize([1 2 3;6 4 2]‘) ans = -1 1 0 0 1 -1 >>featureNormalize( [ 8 1 6; 3 5 7; 4 9 2 ] ) ans = 1.1339 -1.0000 0.3780 -0.7559 0 0.7559 -0.3780 1.0000 -1.1339 >>featureNormalize([1 2 3 1;6 4 2 0;11 3 3 9;4 9 8 8]‘) ans = -0.78335 1.16190 1.09141 -1.46571 0.26112 0.38730 -0.84887 0.78923 1.30558 -0.38730 -0.84887 0.33824 -0.78335 -1.16190 0.60634 0.33824
And try in Matlab:
X_norm = [X(:,1)/sigma(1,1) X(:,2)/sigma(1,2)];
It returns:
>> featureNormalize([1 2 3]‘) Attempted to access X(:,2); index out of bounds because size(X)=[3,1]. Error in featureNormalize (line 39) X_norm = [X(:,1)/sigma(1,1) X(:,2)/sigma(1,2)];
It looks like the former function I composed only support 2 columns matrix operation. This should be supported in mutiple variables cases. So I come up with a way which import the amount of features n:
n = size(sigma,2); for i=1:n, X_norm(:,n) = X(:,n)/sigma(1,n); end
It returns:
>> featureNormalize([1 2 3]‘) ans = 1 2 3 >> featureNormalize([1 2 3;6 4 2]‘) ans = 1 3 2 2 3 1
It seems the result is very closed to the test case result? What is missing? The mean value!!!
Add it immediately with great hope:
n = size(sigma,2); for i=1:n, X_norm(:,n) = (X(:,n)-mu(:,n))/sigma(1,n); end
Check the result:
>> featureNormalize([1 2 3]‘) ans = -1 0 1 >> featureNormalize([1 2 3;6 4 2]‘) ans = 1 1 2 0 3 -1 >> featureNormalize( [ 8 1 6; 3 5 7; 4 9 2 ] ) ans = 8.0000 1.0000 0.3780 3.0000 5.0000 0.7559 4.0000 9.0000 -1.1339
Why is the only the last correct? The misuse of i and n..., finally we have:
function [X_norm, mu, sigma] = featureNormalize(X) %FEATURENORMALIZE Normalizes the features in X % FEATURENORMALIZE(X) returns a normalized version of X where % the mean value of each feature is 0 and the standard deviation % is 1. This is often a good preprocessing step to do when % working with learning algorithms. % You need to set these values correctly X_norm = X; mu = zeros(1, size(X, 2)); sigma = zeros(1, size(X, 2)); % ====================== YOUR CODE HERE ====================== % Instructions: First, for each feature dimension, compute the mean % of the feature and subtract it from the dataset, % storing the mean value in mu. Next, compute the % standard deviation of each feature and divide % each feature by it‘s standard deviation, storing % the standard deviation in sigma. % % Note that X is a matrix where each column is a % feature and each row is an example. You need % to perform the normalization separately for % each feature. % % Hint: You might find the ‘mean‘ and ‘std‘ functions useful. % mu = mean(X, 1); sigma = std(X, 0, 1); n = size(sigma,2); for i=1:n, X_norm(:,i) = (X(:,i)-mu(:,i))/sigma(1,i); end % ============================================================ end
All returned results are same as test cases! Perfect!!!
Computing Cost (for Multiple Variables)
Actually it the same as the cost function in one variable. Omitted...