[ufldl]Supervised Neural Networks

要实现的部分为:forward prop, softmax函数的cost function,每一层的gradient,以及penalty cost和gradient。

  • forwad prop

forward prop是输入sample data,使sample data通过神经网络后得到神经网络输出的过程。

以分类问题来说,不同层的输入和输出如下表所示:

输入 输出
输入层 sample data feature map
隐藏层 feature map feature map
输出层 feature map probabilities of each potential class

所以输入层的输出是sigmoid(W1X+b1),隐藏层的输出是sigmoid(Wlzl-1+bl),而输出层的输出是最终的概率:exp(Wz+b)。代码如下

 1 %% forward prop
 2 %%% YOUR CODE HERE %%%
 3 %隐藏层
 4
 5 for l = 1:numHidden
 6     if(l == 1)
 7         z = stack{l}.W * data;
 8     else
 9         z = stack{l}.W * hAct{l-1};
10     end
11     z = bsxfun(@plus,z,stack{l}.b);%%z:256*60000 b:256*1
12     hAct{l} = 1./(1+exp(-z));
13 end
14 %输出层
15 h  = exp(bsxfun(@plus,stack{numHidden+1}.W * hAct{numHidden},stack{numHidden+1}.b));
16 pred_prob = bsxfun(@rdivide,h,sum(h,1));
17 hAct{numHidden+1} = pred_prob;%最后一层输出的实际上是预测的分类结果
  • softmax函数的cost function

这一步和之前的softmax差别在教程中说的很清楚了:“Note that instead of making predictions from the input data x the softmax function takes as input the final hidden layer of the network”。即分类器的输入不是input data,而是最后一层隐藏层输出的feature map,所以softmax函数的cost function如下:

红框中的h就是输出层输出的概率向量pred_prob。代码如下,细节课参见原始的softmax

1 %% compute cost计算softmax函数的损失函数
2 %%% YOUR CODE HERE %%%
3 logp = log2(pred_prob);
4 index = sub2ind(size(logp),labels‘,1:size(pred_prob,2));
5 ceCost = -sum(logp(index));
  • 计算每一层的gradient

BP(BackPropagation)算法方便的计算了每一层的对应的梯度

其中a是l层的激励(wx+b),δ是每一层的error。

这个公式个人认为有问题,正确的公式应该是:

也就是说l层的W由l-1层的激励和l层的error得到,而不是上述l层的激励和l+1层的error。同样的,cost function对b的导数也应该由下述公式得到:

即由l层的error得到,而不是l+1层的error得到。后面代码实现的时候也是根据这两个公式。

也有可能是我理解错了,欢迎大神指出~详细的推倒过程可以参见这里

对于输出层,我们很容易用groun_truth_label - output的方式计算error,而对于隐藏层,则要用下面的公式计算error:

δ= Wδl+1 * f‘(zl)

其中z是wx+b,f是sigmoid函数,这个函数有一个很好的性质就是:f‘(x) = f(x) * (1-f(x)), 计算起来非常方便。代码如下:

 1 %% compute gradients using backpropagation
 2
 3 %%% YOUR CODE HERE %%%
 4 %输出层
 5 output = zeros(size(pred_prob));
 6 output(index) = 1;
 7 error = pred_prob - output;
 8
 9 for l = numHidden+1 : -1 :1
10     gradStack{l}.b = sum(error,2);
11     if(l == 1)
12         gradStack{l}.W = error * data‘;
13         break;
14     else
15         gradStack{l}.W = error * hAct{l-1}‘;
16     end
17     error = (stack{l}.W)‘*error .* hAct{l-1} .* (1-hAct{l-1});%此处的error对应是l-1层的error
18 end
  • 计算penalty cost和gradient

这里主要是为了防止过拟合,所以对w加了一个正则项。最终的cost function由误差和对w约束的正则项共同组成,cost function对w的导数中也多了一项正则项对w的求导,代码如下:

%% compute weight penalty cost and gradient for non-bias terms
%%% YOUR CODE HERE %%%
%penalty cost
wCost = 0;
for l = 1:numHidden+1
    wCost = wCost + 0.5*ei.lambda * sum(stack{l}.W(:) .^ 2);
end
cost = ceCost + wCost;

%gradient for non-bias terms
for l = numHidden:-1:1
    gradStack{l}.W = gradStack{l}.W + ei.lambda * stack{l}.W;
end

参考资料:

[1]http://neuralnetworksanddeeplearning.com/chap2.html

[2]http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

[3]http://blog.csdn.net/lingerlanlan/article/details/38464317

时间: 2024-08-09 18:04:04

[ufldl]Supervised Neural Networks的相关文章

[C3] Andrew Ng - Neural Networks and Deep Learning

About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep learning engineers are highly sought after, and mastering deep learning will give you numerous new career opportunities. Deep learning is also a new "s

Neural Networks for Machine Learning by Geoffrey Hinton (1~2)

机器学习能良好解决的问题 识别模式 识别异常 预測 大脑工作模式 人类有个神经元,每一个包括个权重,带宽要远好于工作站. 神经元的不同类型 Linear (线性)神经元  Binary threshold (二值)神经元  watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" width="300&quo

(转)Understanding, generalisation, and transfer learning in deep neural networks

Understanding, generalisation, and transfer learning in deep neural networks FEBRUARY 27, 2017 This is the first in a series of posts looking at the 'top 100 awesome deep learning papers.' Deviating from the normal one-paper-per-day format, I'll take

ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS

ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS We recently interviewed Reza Zadeh (@Reza_Zadeh). Reza is a Consulting Professor in the Institute for Computational and Mathematical Engineering at Stanford University and a

[CS231n-CNN] Training Neural Networks Part 1 : activation functions, weight initialization, gradient flow, batch normalization | babysitting the learning process, hyperparameter optimization

课程主页:http://cs231n.stanford.edu/ ? Introduction to neural networks -Training Neural Network ______________________________________________________________________________________________________________________________________________________________

Awesome Recurrent Neural Networks

Awesome Recurrent Neural Networks A curated list of resources dedicated to recurrent neural networks (closely related to deep learning). Maintainers - Jiwon Kim, Myungsub Choi We have pages for other topics: awesome-deep-vision, awesome-random-forest

2016.4.15 nature deep learning review[1]

今天,我本来想膜一下,所以找到了上古时期发表再nature上的反向传播的论文,但是没看下去...所以,翻出来了15年发表在nature上的deep learning,相当于一个review,来阅读一下,而且感觉引文会比较重要,所以这篇中枢值较高的文献拿来学一学. 相关资料: 英文原文: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.894&rep=rep1&type=pdf 中文翻译: http://www.csd

Classifying plankton with deep neural networks

Classifying plankton with deep neural networks The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab o

[译]深度神经网络的多任务学习概览(An Overview of Multi-task Learning in Deep Neural Networks)

译自:http://sebastianruder.com/multi-task/ 1. 前言 在机器学习中,我们通常关心优化某一特定指标,不管这个指标是一个标准值,还是企业KPI.为了达到这个目标,我们训练单一模型或多个模型集合来完成指定得任务.然后,我们通过精细调参,来改进模型直至性能不再提升.尽管这样做可以针对一个任务得到一个可接受得性能,但是我们可能忽略了一些信息,这些信息有助于在我们关心的指标上做得更好.具体来说,这些信息就是相关任务的监督数据.通过在相关任务间共享表示信息,我们的模型在