Handwritten digits recognition (0-9)
Multi-class Logistic Regression
1. Vectorizing Logistic Regression
(1) Vectorizing the cost function
(2) Vectorizing the gradient
(3) Vectorizing the regularized cost function
(4) Vectorizing the regularized gradient
All above 4 formulas can be found in the previous blog: click here.
lrCostFunction.m
1 function [J, grad] = lrCostFunction(theta, X, y, lambda) 2 %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 3 %regularization 4 % J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using 5 % theta as the parameter for regularized logistic regression and the 6 % gradient of the cost w.r.t. to the parameters. 7 8 % Initialize some useful values 9 m = length(y); % number of training examples 10 11 % You need to return the following variables correctly 12 J = 0; 13 grad = zeros(size(theta)); 14 15 % ====================== YOUR CODE HERE ====================== 16 % Instructions: Compute the cost of a particular choice of theta. 17 % You should set J to the cost. 18 % Compute the partial derivatives and set grad to the partial 19 % derivatives of the cost w.r.t. each parameter in theta 20 % 21 % Hint: The computation of the cost function and gradients can be 22 % efficiently vectorized. For example, consider the computation 23 % 24 % sigmoid(X * theta) 25 % 26 % Each row of the resulting matrix will contain the value of the 27 % prediction for that example. You can make use of this to vectorize 28 % the cost function and gradient computations. 29 % 30 % Hint: When computing the gradient of the regularized cost function, 31 % there‘re many possible vectorized solutions, but one solution 32 % looks like: 33 % grad = (unregularized gradient for logistic regression) 34 % temp = theta; 35 % temp(1) = 0; % because we don‘t add anything for j = 0 36 % grad = grad + YOUR_CODE_HERE (using the temp variable) 37 % 38 39 hx = sigmoid(X*theta); 40 reg = lambda/(2*m)*sum(theta(2:size(theta),:).^2); 41 J = -1/m*(y‘*log(hx)+(1-y)‘*log(1-hx)) + reg; 42 theta(1) = 0; 43 grad = 1/m*X‘*(hx-y)+lambda/m*theta; 44 45 46 47 48 49 50 % ============================================================= 51 52 grad = grad(:); 53 54 end
2. One-vs-all Classification (Training)
Return all the classifier parameters in a matrix Θ (a K x N+1 matrix, K is the num_labels and N is the num_features ), where each row of Θ corresponds to the learned logistic regression parameters for one class. You can do this with a ‘for‘-loop from 1 to K, training each classifier independently.
oneVsAll.m
1 function [all_theta] = oneVsAll(X, y, num_labels, lambda) 2 %ONEVSALL trains multiple logistic regression classifiers and returns all 3 %the classifiers in a matrix all_theta, where the i-th row of all_theta 4 %corresponds to the classifier for label i 5 % [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels 6 % logisitc regression classifiers and returns each of these classifiers 7 % in a matrix all_theta, where the i-th row of all_theta corresponds 8 % to the classifier for label i 9 10 % Some useful variables 11 m = size(X, 1); 12 n = size(X, 2); 13 14 % You need to return the following variables correctly 15 all_theta = zeros(num_labels, n + 1); 16 17 % Add ones to the X data matrix 18 X = [ones(m, 1) X]; 19 20 % ====================== YOUR CODE HERE ====================== 21 % Instructions: You should complete the following code to train num_labels 22 % logistic regression classifiers with regularization 23 % parameter lambda. 24 % 25 % Hint: theta(:) will return a column vector. 26 % 27 % Hint: You can use y == c to obtain a vector of 1‘s and 0‘s that tell use 28 % whether the ground truth is true/false for this class. 29 % 30 % Note: For this assignment, we recommend using fmincg to optimize the cost 31 % function. It is okay to use a for-loop (for c = 1:num_labels) to 32 % loop over the different classes. 33 % 34 % fmincg works similarly to fminunc, but is more efficient when we 35 % are dealing with large number of parameters. 36 % 37 % Example Code for fmincg: 38 % 39 % % Set Initial theta 40 % initial_theta = zeros(n + 1, 1); 41 % 42 % % Set options for fminunc 43 % options = optimset(‘GradObj‘, ‘on‘, ‘MaxIter‘, 50); 44 % 45 % % Run fmincg to obtain the optimal theta 46 % % This function will return theta and the cost 47 % [theta] = ... 48 % fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ... 49 % initial_theta, options); 50 % 51 52 for c=1:num_labels, 53 initial_theta = all_theta(c,:)‘; 54 options = optimset(‘GradObj‘,‘on‘,‘MaxIter‘,50); 55 theta = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),initial_theta,options); 56 all_theta(c,:) = theta‘; 57 end; 58 59 60 % ========================================================================= 61 62 63 end
3. One-vs-all Classification (Prediction)
predictOneVsAll.m
Neural Network Prediction
Feedword Propagation and Prediction
predict.m
1 function p = predict(Theta1, Theta2, X) 2 %PREDICT Predict the label of an input given a trained neural network 3 % p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the 4 % trained weights of a neural network (Theta1, Theta2) 5 6 % Useful values 7 m = size(X, 1); 8 num_labels = size(Theta2, 1); 9 10 % You need to return the following variables correctly 11 p = zeros(size(X, 1), 1); 12 13 % ====================== YOUR CODE HERE ====================== 14 % Instructions: Complete the following code to make predictions using 15 % your learned neural network. You should set p to a 16 % vector containing labels between 1 to num_labels. 17 % 18 % Hint: The max function might come in useful. In particular, the max 19 % function can also return the index of the max element, for more 20 % information see ‘help max‘. If your examples are in rows, then, you 21 % can use max(A, [], 2) to obtain the max for each row. 22 % 23 a1 = X; %5000*400 24 a1 = [ones(size(X,1), 1),X]; %5000*401 25 a2 = sigmoid(a1*Theta1‘);%5000*25 26 a2 = [ones(size(a2,1),1),a2]; %5000*26 27 a3 = sigmoid(a2*Theta2‘);%5000*10 28 [tmp,p] = max(a3,[],2); 29 % ========================================================================= 30 31 32 end
Other files and dataset can be download in Coursera.