深度网络实现手写体识别

基于自动编码机(autoencoder),这里网络的层次结构为一个输入层,两个隐层,后面再跟着一个softmax分类器:

采用贪婪算法,首先把input和feature1看作一个自动编码机,训练出二者之间的参数,然后用feature1层的激活值作为输出,输入到feature2,即把feature1和feature2再看作一个自动编码机,训练出这两层之间的参数,这两步都没有用到分类标签,所以是无监督学习,最后把feature2的激活值作为提取的的特征,输入到分类器,这里需要标签来计算代价函数,从而由优化这个代价函数来训练出feature2与分类器之间的参数,所以这一步是有监督学习,这一步完成之后,把测试样本输入网络,最后会输出该样本分别属于每一类的概率,选出最大概率对应的类别,就是最终的分类结果。

为了使得分类结果更加精确,可以对训练出的参数进行微调,就是在有监督学习之后,我们利用有标签的训练数据可以计算出分类残差,然后利用这个残差反向传播,对已经训练出的参数进行进一步微调,会对最终预测的精度有很大提升

下面是第一层训学习出的特征:

可以看出都是一些笔迹的边缘

作为对比,训练结果显示,微调之后,分类准确度有大幅提升,所以在训练深度网络之后,利用部分标签数据进行微调是一件很有必要的学习:

Before Finetuning Test Accuracy: 91.760%
After Finetuning Test Accuracy: 97.710%

下面是部分程序代码,需要用到,完整代码请先下载minFunc.rar,然后下载stacked_exercise.rar,minFunc.rar里面是lbfgs优化函数,在优化网络参数时需要用到。

%% CS294A/CS294W Stacked Autoencoder Exercise

%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the
%  sstacked autoencoder exercise. You will need to complete code in
%  stackedAECost.m
%  You will also need to have implemented sparseAutoencoderCost.m and
%  softmaxCost.m from previous exercises. You will need the initializeParameters.m
%  loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises.
%
%  For the purpose of completing the assignment, you do not need to
%  change the code in this file.
%
%%======================================================================
%% STEP 0: Here we provide the relevant parameters values that will
%  allow your sparse autoencoder to get good filters; you do not need to
%  change the parameters below.

inputSize = 28 * 28;
numClasses = 10;
hiddenSizeL1 = 200;    % Layer 1 Hidden Size
hiddenSizeL2 = 200;    % Layer 2 Hidden Size
sparsityParam = 0.1;   % desired average activation of the hidden units.
                       % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
                       %  in the lecture notes).
lambda = 3e-3;         % weight decay parameter
beta = 3;              % weight of sparsity penalty term       

%%======================================================================
%% STEP 1: Load data from the MNIST database
%
%  This loads our training data from the MNIST database files.

% Load MNIST database files
trainData = loadMNISTImages(‘train-images.idx3-ubyte‘);
trainLabels = loadMNISTLabels(‘train-labels.idx1-ubyte‘);

trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1

%%======================================================================
%% STEP 2: Train the first sparse autoencoder
%  This trains the first sparse autoencoder on the unlabelled STL training
%  images.
%  If you‘ve correctly implemented sparseAutoencoderCost.m, you don‘t need
%  to change anything here.

%  Randomly initialize the parameters
sae1Theta = initializeParameters(hiddenSizeL1, inputSize);

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the first layer sparse autoencoder, this layer has
%                an hidden size of "hiddenSizeL1"
%                You should store the optimal parameters in sae1OptTheta
addpath minFunc/;
options = struct;
options.Method = ‘lbfgs‘;
options.maxIter = 400;
options.display = ‘on‘;
%训练出第一层网络的参数
[sae1OptTheta, cost] =  minFunc(@(p) sparseAutoencoderCost(p,...
                        inputSize,hiddenSizeL1,lambda,...
                        sparsityParam,beta,trainData),...
                        sae1Theta,options);
save(‘step2.mat‘, ‘sae1OptTheta‘);
W1 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);
display_network(W1‘);
% -------------------------------------------------------------------------

%%======================================================================
%% STEP 2: Train the second sparse autoencoder
%  This trains the second sparse autoencoder on the first autoencoder
%  featurse.
%  If you‘ve correctly implemented sparseAutoencoderCost.m, you don‘t need
%  to change anything here.

[sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ...
                                        inputSize, trainData);

%  Randomly initialize the parameters
sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1);

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the second layer sparse autoencoder, this layer has
%                an hidden size of "hiddenSizeL2" and an inputsize of
%                "hiddenSizeL1"
%
%                You should store the optimal parameters in sae2OptTheta
[sae2OptTheta, cost] =  minFunc(@(p)sparseAutoencoderCost(p,...
                        hiddenSizeL1,hiddenSizeL2,lambda,...
                        sparsityParam,beta,sae1Features),...
                        sae2Theta,options);
% figure;
% W11 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);
% W2 = reshape(sae2OptTheta(1:hiddenSizeL2 * hiddenSizeL1), hiddenSizeL2, hiddenSizeL1);
% figure;
% display_network(W2‘);
% -------------------------------------------------------------------------

%%======================================================================
%% STEP 3: Train the softmax classifier
%  This trains the sparse autoencoder on the second autoencoder features.
%  If you‘ve correctly implemented softmaxCost.m, you don‘t need
%  to change anything here.

[sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ...
                                        hiddenSizeL1, sae1Features);

%  Randomly initialize the parameters
saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the softmax classifier, the classifier takes in
%                input of dimension "hiddenSizeL2" corresponding to the
%                hidden layer size of the 2nd layer.
%
%                You should store the optimal parameters in saeSoftmaxOptTheta
%
%  NOTE: If you used softmaxTrain to complete this part of the exercise,
%        set saeSoftmaxOptTheta = softmaxModel.optTheta(:);
softmaxLambda = 1e-4;
numClasses = 10;
softoptions = struct;
softoptions.maxIter = 400;
softmaxModel = softmaxTrain(hiddenSizeL2,numClasses,softmaxLambda,...
                            sae2Features,trainLabels,softoptions);
saeSoftmaxOptTheta = softmaxModel.optTheta(:);

save(‘step4.mat‘, ‘saeSoftmaxOptTheta‘);

% -------------------------------------------------------------------------

%%======================================================================
%% STEP 5: Finetune softmax model

% Implement the stackedAECost to give the combined cost of the whole model
% then run this cell.

% Initialize the stack using the parameters learned
stack = cell(2,1);
stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ...
                     hiddenSizeL1, inputSize);
stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1);
stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ...
                     hiddenSizeL2, hiddenSizeL1);
stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2);

% Initialize the parameters for the deep model
[stackparams, netconfig] = stack2params(stack);
stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ];

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the deep network, hidden size here refers to the ‘
%                dimension of the input to the classifier, which corresponds
%                to "hiddenSizeL2".
%
%
[stackedAEOptTheta, cost] =  minFunc(@(p)stackedAECost(p,inputSize,hiddenSizeL2,...
                         numClasses, netconfig,lambda, trainData, trainLabels),...
                        stackedAETheta,options);
save(‘step5.mat‘, ‘stackedAEOptTheta‘);
% -------------------------------------------------------------------------

%%======================================================================
%% STEP 6: Test
%  Instructions: You will need to complete the code in stackedAEPredict.m
%                before running this part of the code
%

% Get labelled test images
% Note that we apply the same kind of preprocessing as the training set
testData = loadMNISTImages(‘t10k-images.idx3-ubyte‘);
testLabels = loadMNISTLabels(‘t10k-labels.idx1-ubyte‘);

testLabels(testLabels == 0) = 10; % Remap 0 to 10

[pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...
                          numClasses, netconfig, testData);

acc = mean(testLabels(:) == pred(:));
fprintf(‘Before Finetuning Test Accuracy: %0.3f%%\n‘, acc * 100);

[pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...
                          numClasses, netconfig, testData);

acc = mean(testLabels(:) == pred(:));
fprintf(‘After Finetuning Test Accuracy: %0.3f%%\n‘, acc * 100);

% Accuracy is the proportion of correctly classified images
% The results for our implementation were:
%
% Before Finetuning Test Accuracy: 87.7%
% After Finetuning Test Accuracy:  97.6%
%
% If your values are too low (accuracy less than 95%), you should check
% your code for errors, and make sure you are training on the
% entire data set of 60000 28x28 training images
% (unless you modified the loading code, this should be the case)
function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
                                              numClasses, netconfig, ...
                                              lambda, data, labels)

% stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
% and returns cost and gradient using a stacked autoencoder model. Used for
% finetuning.

% theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize:  the number of hidden units *at the 2nd layer*
% numClasses:  the number of categories
% netconfig:   the network configuration of the stack
% lambda:      the weight regularization penalty
% data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example.
% labels: A vector containing labels, where labels(i) is the label for the
% i-th training example

%% Unroll softmaxTheta parameter

% We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);

% Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);

% You will need to compute the following gradients
softmaxThetaGrad = zeros(size(softmaxTheta));
stackgrad = cell(size(stack));
for d = 1:numel(stack)
    stackgrad{d}.w = zeros(size(stack{d}.w));
    stackgrad{d}.b = zeros(size(stack{d}.b));
end

cost = 0; % You need to compute this

% You might find these variables useful
M = size(data, 2);
groundTruth = full(sparse(labels, 1:M, 1));

%% --------------------------- YOUR CODE HERE -----------------------------
%  Instructions: Compute the cost function and gradient vector for
%                the stacked autoencoder.
%
%                You are given a stack variable which is a cell-array of
%                the weights and biases for every layer. In particular, you
%                can refer to the weights of Layer d, using stack{d}.w and
%                the biases using stack{d}.b . To get the total number of
%                layers, you can use numel(stack).
%
%                The last layer of the network is connected to the softmax
%                classification layer, softmaxTheta.
%
%                You should compute the gradients for the softmaxTheta,
%                storing that in softmaxThetaGrad. Similarly, you should
%                compute the gradients for each layer in the stack, storing
%                the gradients in stackgrad{d}.w and stackgrad{d}.b
%                Note that the size of the matrices in stackgrad should
%                match exactly that of the size of the matrices in stack.
%

depth = numel(stack);
z = cell(depth+1,1);
a = cell(depth+1, 1);
a{1} = data;

for layer = (1:depth)
  z{layer+1} = stack{layer}.w * a{layer} + repmat(stack{layer}.b, [1, size(a{layer},2)]);
  a{layer+1} = sigmoid(z{layer+1});
end

M = softmaxTheta * a{depth+1};
M = bsxfun(@minus, M, max(M));
p = bsxfun(@rdivide, exp(M), sum(exp(M)));

cost = -1/numClasses * groundTruth(:)‘ * log(p(:)) + lambda/2 * sum(softmaxTheta(:) .^ 2);
softmaxThetaGrad = -1/numClasses * (groundTruth - p) * a{depth+1}‘ + lambda * softmaxTheta;

d = cell(depth+1);

d{depth+1} = -(softmaxTheta‘ * (groundTruth - p)) .* a{depth+1} .* (1-a{depth+1});

for layer = (depth:-1:2)
  d{layer} = (stack{layer}.w‘ * d{layer+1}) .* a{layer} .* (1-a{layer});
end

for layer = (depth:-1:1)
  stackgrad{layer}.w = (1/numClasses) * d{layer+1} * a{layer}‘;
  stackgrad{layer}.b = (1/numClasses) * sum(d{layer+1}, 2);
end
% -------------------------------------------------------------------------

%% Roll gradient vector
grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];

end

% You might find this useful
function sigm = sigmoid(x)
    sigm = 1 ./ (1 + exp(-x));
end
function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)

% stackedAEPredict: Takes a trained theta and a test data set,
% and returns the predicted labels for each example.

% theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize:  the number of hidden units *at the 2nd layer*
% numClasses:  the number of categories
% data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example. 

% Your code should produce the prediction matrix
% pred, where pred(i) is argmax_c P(y(c) | x(i)).

%% Unroll theta parameter

% We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);

% Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute pred using theta assuming that the labels start
%                from 1.

depth = numel(stack);
z = cell(depth+1,1);
a = cell(depth+1, 1);
a{1} = data;

for layer = (1:depth)
  z{layer+1} = stack{layer}.w * a{layer} + repmat(stack{layer}.b, [1, size(a{layer},2)]);
  a{layer+1} = sigmoid(z{layer+1});
end

[~, pred] = max(softmaxTheta * a{depth+1});

% -----------------------------------------------------------

end

% You might find this useful
function sigm = sigmoid(x)
    sigm = 1 ./ (1 + exp(-x));
end
时间: 2024-10-09 02:37:16

深度网络实现手写体识别的相关文章

深度学习-mnist手写体识别

mnist手写体识别 Mnist数据集可以从官网下载,网址: http://yann.lecun.com/exdb/mnist/ 下载下来的数据集被分成两部分:55000行的训练数据集(mnist.train)和10000行的测试数据集(mnist.test).每一个MNIST数据单元有两部分组成:一张包含手写数字的图片和一个对应的标签.我们把这些图片设为“xs”,把这些标签设为“ys”.训练数据集和测试数据集都包含xs和ys,比如训练数据集的图片是 mnist.train.images ,训练

MNIST数据集手写体识别(CNN实现)

github博客传送门 csdn博客传送门 本章所需知识: 没有基础的请观看深度学习系列视频 tensorflow Python基础 资料下载链接: 深度学习基础网络模型(mnist手写体识别数据集) MNIST数据集手写体识别(CNN实现) import tensorflow as tf import tensorflow.examples.tutorials.mnist.input_data as input_data # 导入下载数据集手写体 mnist = input_data.read

MNIST数据集手写体识别(MLP实现)

github博客传送门 csdn博客传送门 本章所需知识: 没有基础的请观看深度学习系列视频 tensorflow Python基础 资料下载链接: 深度学习基础网络模型(mnist手写体识别数据集) MNIST数据集手写体识别(MLP实现) import tensorflow as tf import tensorflow.examples.tutorials.mnist.input_data as input_data # 导入下载数据集手写体 mnist = input_data.read

MNIST数据集手写体识别(SEQ2SEQ实现)

github博客传送门 csdn博客传送门 本章所需知识: 没有基础的请观看深度学习系列视频 tensorflow Python基础 资料下载链接: 深度学习基础网络模型(mnist手写体识别数据集) MNIST数据集手写体识别(CNN实现) import tensorflow as tf import tensorflow.examples.tutorials.mnist.input_data as input_data # 导入下载数据集手写体 mnist = input_data.read

小白也能懂的手写体识别

手写体识别与Tensorflow 如同所有语言的hello world一样,手写体识别就相当于深度学习里的hello world. TensorFlow是当前最流行的机器学习框架,有了它,开发人工智能程序就像Java编程一样简单. MNIST MNIST 数据集已经是一个被”嚼烂”了的数据集, 很多教程都会对它”下手”, 几乎成为一个 “典范”. 不过有些人可能对它还不是很了解, 下面来介绍一下. MNIST 数据集可在 http://yann.lecun.com/exdb/mnist/ 获取,

机器学习入门实践——线性回归&非线性回归&mnist手写体识别

把一本<白话深度学习与tensorflow>给啃完了,了解了一下基本的BP网络,CNN,RNN这些.感觉实际上算法本身不是特别的深奥难懂,最简单的BP网络基本上学完微积分和概率论就能搞懂,CNN引入的卷积,池化等也是数字图像处理中比较成熟的理论,RNN使用的数学工具相对而言比较高深一些,需要再深入消化消化,最近也在啃白皮书,争取从数学上把这些理论吃透 当然光学理论不太行,还是得要有一些实践的,下面是三个入门级别的,可以用来辅助对BP网络的理解 环境:win10 WSL ubuntu 18.04

超深度网络前沿:Going Deeper

Going Deeper 1. 背景 2006年之前,整个机器学习的理论界,可以说已经是SVM(支持向量机)的天下.SVM以其良好的理论基础,优美的模型和令人舒服的算法性质,俘获了无数科研人员的心. 据说,深度学习三巨头之一的Yann LeCun,曾经与SVM的祖师爷Vapnik就SVM与神经网络发生过激烈而有趣的讨论,最终两人各持己见,各自回家睡觉.于是后边的小弟逐步形成两个"门派". 在那段时间的争论中,Yann LeCun认可SVM作为通用的分类方法十分不错,但本质只是一个二层模

【OCR技术系列之四】基于深度学习的文字识别(3755个汉字)

上一篇提到文字数据集的合成,现在我们手头上已经得到了3755个汉字(一级字库)的印刷体图像数据集,我们可以利用它们进行接下来的3755个汉字的识别系统的搭建.用深度学习做文字识别,用的网络当然是CNN,那具体使用哪个经典网络?VGG?RESNET?还是其他?我想了下,越深的网络训练得到的模型应该会更好,但是想到训练的难度以及以后线上部署时预测的速度,我觉得首先建立一个比较浅的网络(基于LeNet的改进)做基本的文字识别,然后再根据项目需求,再尝试其他的网络结构.这次任务所使用的深度学习框架是强大

使用暹罗{(Xiānlu&#243;),泰国的旧称 one-shot} 网络进行人脸识别

使用暹罗{(Xiānluó),泰国的旧称 one-shot} 网络进行人脸识别 什么是暹罗网络? 暹罗网络是一种特殊类型的神经网络,是最简单.最常用的one-shot学习算法之一. one-shot学习是一种每类只从一个训练例子中学习的技术. 暹罗网络主要用于在每个类中没有很多数据点的应用程序中. 为什么要使用暹罗网络? 例如,假设我们想为我们的组织建立一个人脸识别模型,大约有500人在我们的组织中工作.如果我们想用卷积神经网络(CNN)从零开始建立人脸识别模型,那么我们需要这500个人的很多图