CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction

Handwritten digits recognition (0-9)

Multi-class Logistic Regression

1. Vectorizing Logistic Regression

(1) Vectorizing the cost function

(2) Vectorizing the gradient

(3) Vectorizing the regularized cost function

(4) Vectorizing the regularized gradient

All above 4 formulas can be found in the previous blog: click here.

lrCostFunction.m

 1 function [J, grad] = lrCostFunction(theta, X, y, lambda)
 2 %LRCOSTFUNCTION Compute cost and gradient for logistic regression with
 3 %regularization
 4 %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
 5 %   theta as the parameter for regularized logistic regression and the
 6 %   gradient of the cost w.r.t. to the parameters.
 7
 8 % Initialize some useful values
 9 m = length(y); % number of training examples
10
11 % You need to return the following variables correctly
12 J = 0;
13 grad = zeros(size(theta));
14
15 % ====================== YOUR CODE HERE ======================
16 % Instructions: Compute the cost of a particular choice of theta.
17 %               You should set J to the cost.
18 %               Compute the partial derivatives and set grad to the partial
19 %               derivatives of the cost w.r.t. each parameter in theta
20 %
21 % Hint: The computation of the cost function and gradients can be
22 %       efficiently vectorized. For example, consider the computation
23 %
24 %           sigmoid(X * theta)
25 %
26 %       Each row of the resulting matrix will contain the value of the
27 %       prediction for that example. You can make use of this to vectorize
28 %       the cost function and gradient computations.
29 %
30 % Hint: When computing the gradient of the regularized cost function,
31 %       there‘re many possible vectorized solutions, but one solution
32 %       looks like:
33 %           grad = (unregularized gradient for logistic regression)
34 %           temp = theta;
35 %           temp(1) = 0;   % because we don‘t add anything for j = 0
36 %           grad = grad + YOUR_CODE_HERE (using the temp variable)
37 %
38
39 hx = sigmoid(X*theta);
40 reg = lambda/(2*m)*sum(theta(2:size(theta),:).^2);
41 J = -1/m*(y‘*log(hx)+(1-y)‘*log(1-hx)) + reg;
42 theta(1) = 0;
43 grad = 1/m*X‘*(hx-y)+lambda/m*theta;
44
45
46
47
48
49
50 % =============================================================
51
52 grad = grad(:);
53
54 end

2. One-vs-all Classification (Training)

Return all the classifier parameters in a matrix Θ (a K x N+1 matrix, K is the num_labels and N is the num_features ), where each row of Θ corresponds to the learned logistic regression parameters for one class. You can do this with a ‘for‘-loop from 1 to K, training each classifier independently.

oneVsAll.m

 1 function [all_theta] = oneVsAll(X, y, num_labels, lambda)
 2 %ONEVSALL trains multiple logistic regression classifiers and returns all
 3 %the classifiers in a matrix all_theta, where the i-th row of all_theta
 4 %corresponds to the classifier for label i
 5 %   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
 6 %   logisitc regression classifiers and returns each of these classifiers
 7 %   in a matrix all_theta, where the i-th row of all_theta corresponds
 8 %   to the classifier for label i
 9
10 % Some useful variables
11 m = size(X, 1);
12 n = size(X, 2);
13
14 % You need to return the following variables correctly
15 all_theta = zeros(num_labels, n + 1);
16
17 % Add ones to the X data matrix
18 X = [ones(m, 1) X];
19
20 % ====================== YOUR CODE HERE ======================
21 % Instructions: You should complete the following code to train num_labels
22 %               logistic regression classifiers with regularization
23 %               parameter lambda.
24 %
25 % Hint: theta(:) will return a column vector.
26 %
27 % Hint: You can use y == c to obtain a vector of 1‘s and 0‘s that tell use
28 %       whether the ground truth is true/false for this class.
29 %
30 % Note: For this assignment, we recommend using fmincg to optimize the cost
31 %       function. It is okay to use a for-loop (for c = 1:num_labels) to
32 %       loop over the different classes.
33 %
34 %       fmincg works similarly to fminunc, but is more efficient when we
35 %       are dealing with large number of parameters.
36 %
37 % Example Code for fmincg:
38 %
39 %     % Set Initial theta
40 %     initial_theta = zeros(n + 1, 1);
41 %
42 %     % Set options for fminunc
43 %     options = optimset(‘GradObj‘, ‘on‘, ‘MaxIter‘, 50);
44 %
45 %     % Run fmincg to obtain the optimal theta
46 %     % This function will return theta and the cost
47 %     [theta] = ...
48 %         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
49 %                 initial_theta, options);
50 %
51
52 for c=1:num_labels,
53   initial_theta = all_theta(c,:)‘;
54   options = optimset(‘GradObj‘,‘on‘,‘MaxIter‘,50);
55   theta = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),initial_theta,options);
56   all_theta(c,:) = theta‘;
57 end;
58
59
60 % =========================================================================
61
62
63 end

3. One-vs-all Classification (Prediction)

predictOneVsAll.m

Neural Network Prediction

Feedword Propagation and Prediction

predict.m

 1 function p = predict(Theta1, Theta2, X)
 2 %PREDICT Predict the label of an input given a trained neural network
 3 %   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
 4 %   trained weights of a neural network (Theta1, Theta2)
 5
 6 % Useful values
 7 m = size(X, 1);
 8 num_labels = size(Theta2, 1);
 9
10 % You need to return the following variables correctly
11 p = zeros(size(X, 1), 1);
12
13 % ====================== YOUR CODE HERE ======================
14 % Instructions: Complete the following code to make predictions using
15 %               your learned neural network. You should set p to a
16 %               vector containing labels between 1 to num_labels.
17 %
18 % Hint: The max function might come in useful. In particular, the max
19 %       function can also return the index of the max element, for more
20 %       information see ‘help max‘. If your examples are in rows, then, you
21 %       can use max(A, [], 2) to obtain the max for each row.
22 %
23 a1 = X; %5000*400
24 a1 = [ones(size(X,1), 1),X]; %5000*401
25 a2 = sigmoid(a1*Theta1‘);%5000*25
26 a2 = [ones(size(a2,1),1),a2]; %5000*26
27 a3 = sigmoid(a2*Theta2‘);%5000*10
28 [tmp,p] = max(a3,[],2);
29 % =========================================================================
30
31
32 end

Other files and dataset can be download in Coursera.

时间： 2024-09-29 10:08:35

CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction的相关文章

CheeseZH: Stanford University: Machine Learning Ex5:Regularized Linear Regression and Bias v.s. Variance

源码:https://github.com/cheesezhe/Coursera-Machine-Learning-Exercise/tree/master/ex5 Introduction: In this exercise, you will implement regularized linear regression and use it to study models with different bias-variance properties. 1. Regularized Lin

CheeseZH: Stanford University: Machine Learning Ex2:Logistic Regression

1. Sigmoid Function In Logisttic Regression, the hypothesis is defined as: where function g is the sigmoid function. The sigmoid function is defined as: 2.Cost function and gradient The cost function in logistic regression is: the gradient of the cos

CheeseZH: Stanford University: Machine Learning Ex1:Linear Regression

(1) How to comput the Cost function in Univirate/Multivariate Linear Regression; (2) How to comput the Batch Gradient Descent function in Univirate/Multivariate Linear Regression; (3) How to scale features by mean value and standard deviation; (4) Ho

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for the sigmoid function can be computed as: where: 4.Random initialization randInitializeWeights.m 1 function W = randInitializeWeights(L_in, L_out) 2 %RA

Machine Learning—Classification and logistic regression

印象笔记同步分享:Machine Learning-Classification and logistic regression

machine learning(10) -- classification:logistic regression cost function

logistic regression cost function 图像分布

Machine Learning Techniques -5-Kernel Logistic Regression

5-Kernel Logistic Regression Last class, we learnt about soft margin and its application. Now, a new idea comes to us, could we apply the kernel trick to our old frirend logistic regression? Firstly, let's review those four concepts of margin handlin

Machine learning with python - Linear Regression

Machine learning with python Linear Regression 数据来自 cs229 Problem Set 1 (pdf) Data: q1x.dat, q1y.dat, q2x.dat, q2y.dat PS1 Solution (pdf) 从左上往右下 batchGradientDescent的cost随迭代次数的增加而下降,和收敛结果 stochasticGradientDescent的cost随迭代次数的增加而下降,和收敛结果 normalEquatio

Stanford CS229 Machine Learning by Andrew Ng

CS229 Machine Learning Stanford Course by Andrew Ng Course material, problem set Matlab code written by me, my notes about video course: https://github.com/Yao-Yao/CS229-Machine-Learning Contents: supervised learning Lecture 1 application field, pre-