斯坦福大学机器学习公开课:Programming Exercise 2: Logistic Regression

斯坦福大学机器学习公开课:Programming Exercise 2: Logistic Regression---Matlab实现

1 Logistic Regression

In this part of the exercise, I will build a logistic regression model to predict whether a student gets admitted into a university.

You want to determine each applicant’s chance of admission based on their scores on two exams.

1.1 Visualizing the data

function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure
%   PLOTDATA(x,y) plots the data points with + for the positive examples
%   and o for the negative examples. X is assumed to be a Mx2 matrix.

% Create New Figure
figure; hold on;

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
%               2D plot, using the option 'k+' for the positive
%               examples and 'ko' for the negative examples.

% Find Indices of Positive and Negative Examples
pos = find(y==1); neg = find(y == 0);  % 对应0/1的相应地址向量
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, ...
'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', ...
'MarkerSize', 7);

% =========================================================================

hold off;

end

function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the
%   positive examples and o for the negative examples. X is assumed to be
%   a either
%   1) Mx3 matrix, where the first column is an all-ones column for the
%      intercept.
%   2) MxN, N>3 matrix, where the first column is all-ones

% Plot Data
plotData(X(:,2:3), y);
hold on

if size(X, 2) <= 3  %feature =1,2,
    % Only need 2 points to define a line, so choose two endpoints 端点
    plot_x = [min(X(:,2))-2,  max(X(:,2))+2];% 两点横坐标,第一个特征的最大最小值,横坐标的始末地址-+2。

    % Calculate the decision boundary line
    plot_y = (-1./theta(3)).*( theta(2).*plot_x + theta(1)); %第二个特征的预测值???p=0.5,决策边界 theta*X=0	

    % Plot, and adjust axes for better viewing
    plot(plot_x, plot_y)

    % Legend, specific for the exercise
    legend('Admitted', 'Not admitted', 'Decision Boundary')
    axis([30, 100, 30, 100])

else  %
    % Here is the grid range
    u = linspace(-1, 1.5, 50); % -1->1.5 区间50 等分取点
    v = linspace(-1, 1.5, 50);

    z = zeros(length(u), length(v));
    % Evaluate z = theta*x over the grid
    for i = 1:length(u)
        for j = 1:length(v)
            z(i,j) = mapFeature(u(i), v(j))*theta;
        end
    end
    z = z'; % important to transpose z before calling contour

    % Plot z = 0
    % Notice you need to specify the range [0, 0]
    contour(u, v, z, [0, 0], 'LineWidth', 2)%???
end
hold off

end

% MAPFEATURE Feature mapping function to polynomial features
  % MAPFEATURE(X1, X2) maps the two input features
  % to quadratic features used in the regularization exercise.

  % Returns a new feature array with more features, comprising of
  % X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..

  % Inputs X1, X2 must be the same size

1.2 Implementation

1.2.1 Warmup exercise: sigmoid function

The logistic regression hypothesis is :

For a matrix, the SIGMOID function perform the sigmoid function on every element.

function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
%   J = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly
g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).

g = 1./(ones(size(z))+e.^(-z));

% =============================================================

end<strong>
</strong>

1.2.2   Cost function and gradient

The costFunction function implement the cost function and gradient for logistic regression, return the cost and gradient.

Cost

Gradient

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

n = length(theta);
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta

predictions = sigmoid(X*theta);         % m x 1 predictions of hypothesis on all m examples
J = 1/m *(-y'*log(predictions)-(1-y)'*log(1-predictions)) + 1/(2*m)*lambda*(theta'*theta-(theta(1,1))^2);   % cost function

grad(1,1) = (1/m *(predictions-y)'*X(:,1));
%size(grad(2:n,1));
%(1/m *lambda*theta(2:n,1))
grad(2:n,1) = (1/m *(predictions-y)'*X(:,2:n) )'+ 1/m *lambda*theta(2:n,1);%+ 

% =============================================================

end<strong style="color: rgb(255, 0, 0);">
</strong>

1.2.3  Learning parameters using fminunc

Octave’s fminunc is an optimization solver that finds the minimum of an unconstrained function.

You will pass to fminunc the following inputs:

  • The initial values of the parameters we are trying to optimize
  • A function that, when given the training set and a particular theta, computes the logistic regressioncost and gradient with respect to theta for the dataset(X, y)
%% Machine Learning Online Class - Exercise 2: Logistic Regression
%
%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the logistic
%  regression exercise. You will need to complete the following functions 
%  in this exericse:
%
%     sigmoid.m
%     costFunction.m
%     predict.m
%     costFunctionReg.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

%% Initialization
clear ; close all; clc

%% Load Data
%  The first two columns contains the exam scores and the third column
%  contains the label.

data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);

%% ==================== Part 1: Plotting ====================
%  We start the exercise by first plotting the data to understand the 
%  the problem we are working with.

fprintf(['Plotting data with + indicating (y = 1) examples and o ' ...
         'indicating (y = 0) examples.\n']);

plotData(X, y);

% Put some labels 
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')

% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============ Part 2: Compute Cost and Gradient ============
%  In this part of the exercise, you will implement the cost and gradient
%  for logistic regression. You neeed to complete the code in 
%  costFunction.m

%  Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);

% Add intercept term to x and X_test
X = [ones(m, 1) X];

% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);

% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Gradient at initial theta (zeros): \n');
fprintf(' %f \n', grad);

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============= Part 3: Optimizing using fminunc  =============
%  In this exercise, you will use a built-in function (fminunc) to find the
%  optimal parameters theta.

%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost 
[theta, cost] = ...
<span style="white-space:pre">	</span>fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

% Print theta to screen
fprintf('Cost at theta found by fminunc: %f\n', cost);
fprintf('theta: \n');
fprintf(' %f \n', theta);

% Plot Boundary
plotDecisionBoundary(theta, X, y);

% Put some labels 
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')

% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============== Part 4: Predict and Accuracies ==============
%  After learning the parameters, you'll like to use it to predict the outcomes
%  on unseen data. In this part, you will use the logistic regression model
%  to predict the probability that a student with score 45 on exam 1 and 
%  score 85 on exam 2 will be admitted.
%
%  Furthermore, you will compute the training and test set accuracies of 
%  our model.
%
%  Your task is to complete the code in predict.m

%  Predict probability for a student with score 45 on exam 1 
%  and score 85 on exam 2 

prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission ' ...
         'probability of %f\n\n'], prob);

% Compute accuracy on our training set
p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

2  Regularizedlogistic regression

It will implement regularized logistic regression to predict whether microchips from afabrication plant passes quality assurance(QA).

2.1 Visualizing the data

2.2  Featuremapping

One way to fit the data better is to create more features from each data point. In the provided function mapFeature.m, we will map the 2 features into 28 polynomial terms of x1 and x2 up to the sixth power.

<strong>function out = mapFeature(X1, X2)
</strong>% MAPFEATURE Feature mapping function to polynomial features
%
%   MAPFEATURE(X1, X2) maps the two input features
%   to quadratic features used in the regularization exercise.
%
%   Returns a new feature array with more features, comprising of
%   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
%   Inputs X1, X2 must be the same size
%

degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
    for j = 0:i
        out(:, end+1) = (X1.^(i-j)).*(X2.^j);
    end
end

end

2.3  Cost function and gradient

Now i will implement code to compute the cost function and gradient for regularized logistic regression to avoid overfitting.

cost function

gradient

<strong>function [J, grad] = costFunctionReg(theta, X, y, lambda)
</strong>%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

n = length(theta);
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta

predictions = sigmoid(X*theta);         % m x 1 predictions of hypothesis on all m examples
J = 1/m *(-y'*log(predictions)-(1-y)'*log(1-predictions)) + 1/(2*m)*lambda*(theta'*theta-(theta(1,1))^2);   % cost function

grad(1,1) = (1/m *(predictions-y)'*X(:,1));
%size(grad(2:n,1));
%(1/m *lambda*theta(2:n,1))
grad(2:n,1) = (1/m *(predictions-y)'*X(:,2:n) )'+ 1/m *lambda*theta(2:n,1);%+ 

% =============================================================

end<strong>
</strong>

2.4 Plotting the decision boundary

2.5 Optional (ungraded) exercises

%% Machine Learning Online Class - Exercise 2: Logistic Regression
%
%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the second part
%  of the exercise which covers regularization with logistic regression.
%
%  You will need to complete the following functions in this exericse:
%
%     sigmoid.m
%     costFunction.m
%     predict.m
%     costFunctionReg.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

%% Initialization
clear ; close all; clc

%% Load Data
%  The first two columns contains the X values and the third column
%  contains the label (y).

data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);

plotData(X, y);

% Put some labels
hold on;

% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')

% Specified in plot order
legend('y = 1', 'y = 0')
hold off;

%% =========== Part 1: Regularized Logistic Regression ============
%  In this part, you are given a dataset with data points that are not
%  linearly separable. However, you would still like to use logistic
%  regression to classify the data points.
%
%  To do so, you introduce more features to use -- in particular, you add
%  polynomial features to our data matrix (similar to polynomial
%  regression).
%

% Add Polynomial Features

% Note that mapFeature also adds a column of ones for us, so the intercept
% term is handled
%size(X)
X = mapFeature(X(:,1), X(:,2));
%size(X)
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set regularization parameter lambda to 1
lambda = 1;

% Compute and display initial cost and gradient for regularized logistic
% regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);

fprintf('Cost at initial theta (zeros): %f\n', cost);

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============= Part 2: Regularization and Accuracies =============
%  Optional Exercise:
%  In this part, you will get to try different values of lambda and
%  see how regularization affects the decision coundart
%
%  Try the following values of lambda (0, 1, 10, 100).
%
%  How does the decision boundary change when you vary lambda? How does
%  the training set accuracy vary?
%

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;

% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);

% Optimize
[theta, J, exit_flag] = ...
	fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))

% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')

legend('y = 1', 'y = 0', 'Decision boundary')
hold off;

% Compute accuracy on our training set
p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);<strong>
</strong>

时间: 2024-10-05 04:58:37

斯坦福大学机器学习公开课:Programming Exercise 2: Logistic Regression的相关文章

斯坦福大学机器学习公开课---Programming Exercise 1: Linear Regression

斯坦福大学机器学习公开课---Programming Exercise 1: Linear Regression 1  Linear regression with one variable In thispart of this exercise, you will implement linear regression with one variableto predict profits for a food truck. Suppose you are the CEO of a rest

斯坦福大学机器学习公开课 ---Octave Tutorial Transcript

斯坦福大学机器学习公开课 ---Octave Tutorial Transcript Prompt (命令窗口提示符)can be changed with the command PS1('>> '). Transcript 1  Basics 1.1 Basic algebra in Octave Elementary +; -; *; / ;            %arithmetic operations. == ; ~=;&&; ||; xor ;  % logic

斯坦福大学机器学习公开课学习—1.机器学习的动机与应用

斯坦福大学机器学习公开课学习—1.机器学习的动机与应用 介绍了课程主要内容包含以下4点 1.supervised learning(监督学习) 2.learning theory(学习理论) 3.unsupervised learning(非监督学习) 4.reinforcement learning(强化学习) 其中介绍了很多例子,有一些例子还是非常有趣的: 而且通过课程内容我发现机器学习的应用范围真的比之前想象的大多了,而且现在也的确在很多领域取得了很大的成就. 监督学习介绍了回归问题,分类

斯坦福大学机器学习公开课学习—2.监督学习应用&#183;梯度下降

这节课的学习,相信一般上过统计或者运筹相关课程的同学应该都会有所了解.课上涉及的知识虽然很基础,但是也是很重要的. 在搜集了一些房价相关数据的基础上,利用线性回归算法来预测房价. 为了方便算法的训练推导,进行了很多符号的标准规定,从这当中也学到了一些知识,以后自己在进行一些算法的推导时也可学习课上的这些标准符号和推导方式. 下面给出这堂课上的一些干货. 1.机器学习算法的基本框架 2.最小二乘法——线性回归常用的代价函数,即误差平方和最小 3.参数学习算法——梯度下降算法,包含批量梯度下降和随机

机器学习公开课笔记(3):Logistic回归

Logistic 回归 通常是二元分类器(也可以用于多元分类),例如以下的分类问题 Email: spam / not spam Tumor: Malignant / benign 假设 (Hypothesis):$$h_\theta(x) = g(\theta^Tx)$$ $$g(z) = \frac{1}{1+e^{-z}}$$ 其中g(z)称为sigmoid函数,其函数图象如下图所示,可以看出预测值$y$的取值范围是(0, 1),这样对于 $h_\theta(x) \geq 0.5$, 模

Stanford大学机器学习公开课(四):牛顿法、指数分布族、广义线性模型

(一)牛顿法解最大似然估计 牛顿方法(Newton's Method)与梯度下降(Gradient Descent)方法的功能一样,都是对解空间进行搜索的方法.其基本思想如下: 对于一个函数f(x),如果我们要求函数值为0时的x,如图所示: 我们先随机选一个点,然后求出该点的切线,即导数,延长它使之与x轴相交,以相交时的x的值作为下一次迭代的值. 更新规则为: 那么如何将牛顿方法应用到机器学习问题求解中呢? 对于机器学习问题,我们优化的目标函数为极大似然估计L,当极大似然估计函数取得最大时,其导

Stanford大学机器学习公开课(三):局部加权回归、最小二乘的概率解释、逻辑回归、感知器算法

(一)局部加权回归 通常情况下的线性拟合不能很好地预测所有的值,因为它容易导致欠拟合(under fitting).如下图的左图.而多项式拟合能拟合所有数据,但是在预测新样本的时候又会变得很糟糕,因为它导致数据的 过拟合(overfitting),不符合数据真实的模型.如下图的右图. 下面来讲一种非参数学习方法——局部加权回归(LWR).为什么局部加权回归叫做非参数学习方法呢?首先,参数学习方法是这样一种方法:在训练完成所有数据后得到一系列训练参数,然后根据训练参数来预测新样本的值,这时不再依赖

Stanford大学机器学习公开课(五):生成学习算法、高斯判别、朴素贝叶斯

(一)生成学习算法 在线性回归和Logistic回归这种类型的学习算法中我们探讨的模型都是p(y|x;θ),即给定x的情况探讨y的条件概率分布.如二分类问题,不管是感知器算法还是逻辑回归算法,都是在解空间中寻找一条直线从而把两种类别的样例分开,对于新的样例,只要判断在直线的哪一侧即可:这种直接对问题求解的方法可以称为判别学习方法. 而生成学习算法则是对两个类别分别进行建模,用新的样例去匹配两个模板,匹配度较高的作为新样例的类别,比如分辨大象(y=1)和狗(y=0),首先,观察大象,然后建立一个大

Stanford大学机器学习公开课(二):监督学习应用与梯度下降

本课内容: 1.线性回归 2.梯度下降 3.正规方程组 监督学习:告诉算法每个样本的正确答案,学习后的算法对新的输入也能输入正确的答案 1.线性回归 问题引入:假设有一房屋销售的数据如下: 引入通用符号: m =训练样本数 x =输入变量(特征) y =输出变量(目标变量) (x,y)—一个样本 ith—第i个训练样本=(x(i),y(i)) 本例中:m:数据个数,x:房屋大小,y:价格 监督学习过程: 1) 将训练样本提供给学习算法 2) 算法生成一个输出函数(一般用h表示,成为假设) 3)