BP网络中的反向传播

本文的主要参考：How the backpropagation algorithm works

下面是BP网络的参数结构示意图

首先定义第l层网络第j个神经元的输出(activation)

为了表示简便，令

则有a^l_j=σ(z^l_j)，其中σ是激活函数

定义网络的cost function，其中的n是训练样本的个数。

下面主要介绍使用反向传播来求取cost function相对于权重w_ij和偏置项b_ij的导数。

显然，当输入已知时，cost function只是权值w和偏置项b的函数。这里为了方便推倒，首先计算出?C/?z^l_j，令

由于a^l_j=σ(z^l_j)，所以显然有

式中的L表示最后一层网络，即输出层。如果只考虑一个训练样本，则cost function可表示为

如果将输出层的所有输出看成一个列向量，则δ_j^L可以写成下式，Θ表示向量的点乘

下面最关键的问题来了，如何同过δ^l+1求取δ^l。这里就用到了?C/?z^l_j这一重要的中间表达，推倒过程如下

因此，最终有

写成向量的形式为

利用与上面类似的推倒，可以得到

将上面重要的公式用矩阵乘法形式再表达一遍

式中Σ^‘(z^L)是主对角线上的元素为σ^‘(z^L_j)的对角矩阵。求取了cost function相对于权重w_ij和偏置项b_ij的导数之后，便可以使用一些基于梯度的优化算法对网络的权值进行更新。下面是一个２输入２输出的一个BP网络的代码示例，实现的是对输入的每个元素进行逻辑取反操作。

  1 import numpy as np
  2
  3 def tanh(x):
  4     return np.tanh(x)
  5
  6 def tanh_prime(x):
  7     x = np.tanh(x)
  8     return 1.0 - x ** 2
  9
 10 class Network(object):
 11
 12     def __init__(self, sizes):
 13         self.num_layers = len(sizes)
 14         self.sizes = sizes
 15         # self.biases is a column vector
 16         # self.weights‘ structure is the same as in the book: http://neuralnetworksanddeeplearning.com/chap2.html
 17         self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
 18         self.weights = [np.random.randn(y, x)
 19                         for x, y in zip(sizes[:-1], sizes[1:])]
 20
 21     def feedforward(self, a):
 22         """Return the output of the network if "a" is input."""
 23         for b, w in zip(self.biases, self.weights):
 24             a = sigmoid(np.dot(w, a) + b)
 25         return a
 26
 27     def update_mini_batch(self, mini_batch, learning_rate = 0.2):
 28         """Update the network‘s weights and biases by applying
 29         gradient descent using backpropagation to a single mini batch.
 30         The "mini_batch" is a list of tuples "(x, y)"."""
 31         nabla_b = [np.zeros(b.shape) for b in self.biases]
 32         nabla_w = [np.zeros(w.shape) for w in self.weights]
 33
 34         # delta_nabla_b is dC/db, delta_nabla_w is dC/dw
 35         for x, y in mini_batch:
 36             delta_nabla_b, delta_nabla_w = self.backprop(x, y)
 37             nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
 38             nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
 39         self.weights = [w - (learning_rate/len(mini_batch)) * nw
 40                         for w, nw in zip(self.weights, nabla_w)]
 41         self.biases = [b - (learning_rate/len(mini_batch)) * nb
 42                        for b, nb in zip(self.biases, nabla_b)]
 43
 44     def backprop(self, x, y):
 45         """Return a tuple ``(nabla_b, nabla_w)`` representing the
 46         gradient for the cost function C_x.  ``nabla_b`` and
 47         ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
 48         to ``self.biases`` and ``self.weights``."""
 49         nabla_b = [np.zeros(b.shape) for b in self.biases]
 50         nabla_w = [np.zeros(w.shape) for w in self.weights]
 51
 52         # feedforward
 53         activation = x
 54         activations = [x]   # list to store all the activations, layer by layer
 55         zs = []             # list to store all the z vectors, layer by layer
 56
 57         # After this loop, activations = [a0, a1, ..., aL], zs = [z1, z2, ..., zL]
 58         for b, w in zip(self.biases, self.weights):
 59             z = np.dot(w, activation) + b
 60             zs.append(z)
 61             activation = sigmoid(z)
 62             activations.append(activation)
 63
 64         # backward pass
 65         # delta = deltaL .* sigma‘(zL)
 66         delta = self.cost_derivative(activations[-1], y) *  67                 sigmoid_prime(zs[-1])
 68
 69         # dC/dbL = delta
 70         # dC/dwL = deltaL * a(L-1)^T
 71         nabla_b[-1] = delta
 72         nabla_w[-1] = np.dot(delta, activations[-2].transpose())
 73
 74         ‘‘‘Note that the variable l in the loop below is used a little
 75         differently to the notation in Chapter 2 of the book. Here,
 76         l = 1 means the last layer of neurons, l = 2 is the
 77         second-last layer, and so on. It‘s a renumbering of the
 78         scheme in the book, used here to take advantage of the fact
 79         that Python can use negative indices in lists.‘‘‘
 80         # z = z(L-l+1), here, l start from 2, end with self.num_layers-1, namely, L-1
 81         # delta = delta(L-l+1) = w(L-l+2)^T * delta(L-l+2) .* z(L-l+1)
 82         # nabla_b[L-l+1] = delta(L-l+1)
 83         # nabla_w[L-l+1] = delta(L-l+1) * a(L-l)^T
 84         for l in xrange(2, self.num_layers):
 85             z = zs[-l]
 86             sp = sigmoid_prime(z)
 87             delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp
 88             nabla_b[-l] = delta
 89             nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose())
 90         return (nabla_b, nabla_w)
 91
 92     def evaluate(self, test_data):
 93         """Return the number of test inputs for which the neural
 94         network outputs the correct result. Note that the neural
 95         network‘s output is assumed to be the index of whichever
 96         neuron in the final layer has the highest activation."""
 97         test_results = self.feedforward(test_data)
 98         return test_results
 99
100     def cost_derivative(self, output_activations, y):
101         return (output_activations - y)
102
103 #### Miscellaneous functions
104 def sigmoid(z):
105     return 1.0/(1.0 + np.exp(-z))
106
107 # derivative of the sigmoid function
108 def sigmoid_prime(z):
109     return sigmoid(z) * (1 - sigmoid(z))
110
111 if __name__ == ‘__main__‘:
112
113     nn = Network([2, 2, 2])
114
115     X = np.array([[0, 0],
116                   [0, 1],
117                   [1, 0],
118                   [1, 1]])
119
120     y = np.array([[1, 1],
121                   [1, 0],
122                   [0, 1],
123                   [0, 0]])
124
125     for k in range(40000):
126         if k % 10000 == 0:
127             print ‘epochs:‘, k
128         # Randomly select a sample.
129         i = np.random.randint(X.shape[0])
130         nn.update_mini_batch(zip([np.atleast_2d(X[i]).T], [np.atleast_2d(y[i]).T]))
131
132     for e in X:
133         print(e, nn.evaluate(np.atleast_2d(e).T))

运行结果

epochs: 0
epochs: 10000
epochs: 20000
epochs: 30000
(array([0, 0]), array([[ 0.98389328],
       [ 0.97490859]]))
(array([0, 1]), array([[ 0.97694707],
       [ 0.01646559]]))
(array([1, 0]), array([[ 0.03149928],
       [ 0.97737158]]))
(array([1, 1]), array([[ 0.01347963],
       [ 0.02383405]]))

时间： 2024-11-03 05:29:07

BP网络中的反向传播的相关文章

神经网络中的反向传播法

直观理解反向传播法反向传播算法其实就是链式求导法则的应用.按照机器学习的通用套路,我们先确定神经网络的目标函数,然后用随机梯度下降优化算法去求目标函数最小值时的参数值. 反向传播算法损失函数与正则化项假设我们有一个固定样本集\(\{(x^{(1)},y^{(1)}),···,(x^{(m)},y^{(m)})\}\)它包含m个样本.我们可以用批量梯度下降法来求解神经网络.具体来讲,对于单个样例(x,y),其代价函数为:\[J(W,b;x,y)=\frac{1}{2}||h_{W,b}{(x

一文弄懂神经网络中的反向传播法——BackPropagation

最近在看深度学习的东西,一开始看的吴恩达的UFLDL教程,有中文版就直接看了,后来发现有些地方总是不是很明确,又去看英文版,然后又找了些资料看,才发现,中文版的译者在翻译的时候会对省略的公式推导过程进行补充,但是补充的又是错的,难怪觉得有问题.反向传播法其实是神经网络的基础了,但是很多人在学的时候总是会遇到一些问题,或者看到大篇的公式觉得好像很难就退缩了,其实不难,就是一个链式求导法则反复用.如果不想看公式,可以直接把数值带进去,实际的计算一下,体会一下这个过程之后再来推导公式,这样就会觉得很容

神经网络中的反向传播算法

神经网络中的方向传播算法讲得复杂难懂.简单地说明它的原理: 神经网络:输入层,隐藏层,输出层.根据线性关系,激活函数,并最终根据监督学习写出误差表达式.此时,误差函数可写成,那么权值w和它之间存在什么关系?求偏导分析之间的变化关系不过如此.

深度学习基础--神经网络--BP反向传播算法

BP算法: 1.是一种有监督学习算法,常被用来训练多层感知机. 2.要求每个人工神经元(即节点)所使用的激励函数必须可微. (激励函数:单个神经元的输入与输出之间的函数关系叫做激励函数.) (假如不使用激励函数,神经网络中的每层都只是做简单的线性变换,多层输入叠加后也还是线性变换.因为线性模型的表达能力不够,激励函数可以引入非线性因素) 下面两幅图分别为:无激励函数的神经网络和激励函数的神经网络如图所示,加入非线性激活函数后的差异:上图为用线性组合逼近平滑曲线来分割平面,下图为使用平滑的曲线

【MLP】多层感知机网络——BPN反向传播神经网络

BPN(Back Propagation Net) 反向传播神经网络是对非线性可微分函数进行权值训练的多层网络,是前向神经网络的一种. BP网络主要用于: 1)函数逼近与预测分析:用输入矢量和相应的输出矢量训练一个网络,逼近一个函数或预测未知信息: 2)模式识别:用一个特定的输出矢量将它与输入矢量联系起来: 3)分类:把输入矢量以所定义的合适方式进行分类: 4)数据压缩:减少输出矢量维数以便于传输与存储. 比如,一个三层BPN结构如下: 由输入层.隐含层和输出层三层组成.其中每一层的单元与之相邻

第二节，神经网络中反向传播四个基本公式证明——BackPropagation

参考文章神经网络基础 Neural Networks and Deep Learning. Michael A. Nielsen 一文弄懂神经网络中的反向传播法:讲的很详细,用实例演示了反向传播法中权重的更新过程,但是未涉及偏置的更新假设一个三层的神经网络结构图如下: 对于一个单独的训练样本x其二次代价函数可以写成: C = 1/2|| y - aL||2 = 1/2∑j(yj - ajL)2 ajL=σ(zjL) zjl = ∑kωjklakl-1 + bjl 代价函数C是aj

关于BP网络的一些总结

背景前段时间,用过一些模型如vgg,lexnet,用于做监督学习训练,顺带深入的学习了一下相关模型的结构&原理,对于它的反向传播算法记忆比较深刻, 就自己的理解来描述一下BP网络. 关于BP网络的整体简述 BP神经网络,全程为前馈神经网络,它被用到监督学习中的主体思想是(我们假定我们这里各个层Layer次间采用的是全链接): 通过各个Layer层的激励和权值以及偏置的处理向前传递,最终得到一个预期的值,然后通过标签值和预期的值得到一个残差值,残差值的大小反映了预期值和残差值的偏离程度,然后使用

什么是反向传播

作者:韩小雨类别:①反向传播算法 ②反向传播模型反向传播算法(英:Backpropagation algorithm,简称:BP算法) 算法简介:是一种监督学习算法,常被用来训练多层感知机. 于1974年,Paul Werbos[1]首次给出了如何训练一般网络的学习算法,而人工神经网络只是其中的特例.不巧的,在当时整个人工神经网络社群中却无人知晓Paul所提出的学习算法.直到80年代中期,BP算法才重新被David Rumelhart.Geoffrey Hinton及Ronald Will

什么是反向传播（第二篇）

作者韩小雨类比几个人站成一排,第一个人看一幅画(输入数据),描述给第二个人(隐层)--依此类推,到最后一个人(输出)的时候,画出来的画肯定不能看了(误差较大). 反向传播就是,把画拿给最后一个人看(求取误差),然后最后一个人就会告诉前面的人下次描述时需要注意哪里(权值修正). 不知明白了没有,如果需要理论推导(其实就是链式法则+梯度下降法),可以参考1986年的bp算法的论文.(20141202, 补上论文题目: Learning representations by back-propaga