BP算法演示

本文转载自https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Background

Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example with actual numbers. This post is my attempt to explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they understand backpropagation correctly.

If this kind of thing interests you, you should sign up for my newsletter where I post about AI-related projects that I’m working on.

Backpropagation in Python

You can play around with a Python script that I wrote that implements the backpropagation algorithm in this Github repo.

Backpropagation Visualization

For an interactive visualization showing a neural network as it learns, check out my Neural Network visualization.

Additional Resources

If you find this tutorial useful and want to continue learning about neural networks and their applications, I highly recommend checking out Adrian Rosebrock’s excellent tutorial on Getting Started with Deep Learning and Python.

Overview

For this tutorial, we’re going to use a neural network with two inputs, two hidden neurons, two output neurons. Additionally, the hidden and output neurons will include a bias.

Here’s the basic structure:

In order to have some numbers to work with, here are the initial weights, the biases, and training inputs/outputs:

The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.

For the rest of this tutorial we’re going to work with a single training set: given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99.

The Forward Pass

To begin, lets see what the neural network currently predicts given the weights and biases above and inputs of 0.05 and 0.10. To do this we’ll feed those inputs forward though the network.

We figure out the total net input to each hidden layer neuron, squash the total net input using an activation function (here we use the logistic function), then repeat the process with the output layer neurons.

Total net input is also referred to as just net input by some sources.

Here’s how we calculate the total net input for :

We then squash it using the logistic function to get the output of :

Carrying out the same process for  we get:

We repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs.

Here’s the output for :

And carrying out the same process for  we get:

Calculating the Total Error

We can now calculate the error for each output neuron using the squared error function and sum them to get the total error:

Some sources refer to the target as the ideal and the output as the actual.

The  is included so that exponent is cancelled when we differentiate later on. The result is eventually multiplied by a learning rate anyway so it doesn’t matter that we introduce a constant here [1].

For example, the target output for  is 0.01 but the neural network output 0.75136507, therefore its error is:

Repeating this process for  (remembering that the target is 0.99) we get:

The total error for the neural network is the sum of these errors:

The Backwards Pass

Our goal with backpropagation is to update each of the weights in the network so that they cause the actual output to be closer the target output, thereby minimizing the error for each output neuron and the network as a whole.

Output Layer

Consider . We want to know how much a change in  affects the total error, aka .

 is read as “the partial derivative of  with respect to “. You can also say “the gradient with respect to “.

By applying the chain rule we know that:

Visually, here’s what we’re doing:

We need to figure out each piece in this equation.

First, how much does the total error change with respect to the output?

 is sometimes expressed as 

When we take the partial derivative of the total error with respect to , the quantity  becomes zero because  does not affect it which means we’re taking the derivative of a constant which is zero.

Next, how much does the output of  change with respect to its total net input?

The partial derivative of the logistic function is the output multiplied by 1 minus the output:

Finally, how much does the total net input of  change with respect to ?

Putting it all together:

You’ll often see this calculation combined in the form of the delta rule:

Alternatively, we have  and  which can be written as , aka  (the Greek letter delta) aka the node delta. We can use this to rewrite the calculation above:

Therefore:

Some sources extract the negative sign from  so it would be written as:

/*每个权重的梯度都等于与其相连的前一层节点的输出(即)乘以与其相连的后一层的反向传播的输出(即,而*/

To decrease the error, we then subtract this value from the current weight (optionally multiplied by some learning rate, eta, which we’ll set to 0.5):

Some sources use  (alpha) to represent the learning rate, others use (eta), and others even use  (epsilon).

We can repeat this process to get the new weights , and :

We perform the actual updates in the neural network after we have the new weights leading into the hidden layer neurons (ie, we use the original weights, not the updated weights, when we continue the backpropagation algorithm below).

Hidden Layer

Next, we’ll continue the backwards pass by calculating new values for , and .

Big picture, here’s what we need to figure out:

Visually:

We’re going to use a similar process as we did for the output layer, but slightly different to account for the fact that the output of each hidden layer neuron contributes to the output (and therefore error) of multiple output neurons. We know that  affects both  and  therefore the  needs to take into consideration its effect on the both output neurons:

Starting with :

We can calculate  using values we calculated earlier:

And  is equal to :

Plugging them in:

Following the same process for , we get:

Therefore:

Now that we have , we need to figure out  and then  for each weight:

We calculate the partial derivative of the total net input to  with respect to the same as we did for the output neuron:

Putting it all together:

You might also see this written as:

/*每个权重的梯度都等于与其相连的前一层节点的输出(即i1)乘以与其相连的后一层的反向传播的输出(即δh1,一层层求出δh1是关键*/

We can now update :

Repeating this for , and 

Finally, we’ve updated all of our weights! When we fed forward the 0.05 and 0.1 inputs originally, the error on the network was 0.298371109. After this first round of backpropagation, the total error is now down to 0.291027924. It might not seem like much, but after repeating this process 10,000 times, for example, the error plummets to 0.000035085. At this point, when we feed forward 0.05 and 0.1, the two outputs neurons generate 0.015912196 (vs 0.01 target) and 0.984065734 (vs 0.99 target).

总结:

1、每个权重的梯度都等于与其相连的前一层节点的输出  乘以  与其相连的后一层的反向传播的输出,重要的结论说三遍!

2、新权重 = 原权重 - *(总偏差对该权重的梯度值),如

3、参考博文:http://blog.csdn.net/zhongkejingwang/article/details/44514073

时间: 2024-10-25 16:02:58

BP算法演示的相关文章

BP算法详解

说到神经网络,大家看到这个图应该不陌生: 这是典型的三层神经网络的基本构成,Layer L1是输入层,Layer L2是隐含层,Layer L3是隐含层,我们现在手里有一堆数据{x1,x2,x3,-,xn},输出也是一堆数据{y1,y2,y3,-,yn},现在要他们在隐含层做某种变换,让你把数据灌进去后得到你期望的输出.如果你希望你的输出和原始输入一样,那么就是最常见的自编码模型(Auto-Encoder).可能有人会问,为什么要输入输出都一样呢?有什么用啊?其实应用挺广的,在图像识别,文本分类

L3--数组算法演示

介绍 把所有的结点用一根线穿起来. 连续存储[数组] 离散存储[链表] 线性结构的两种常见的应用之一 栈 线性结构的两种常见的应用之二 队列(与时间相关的操作) 数组算法演示 #include <stdio.h> #include <malloc.h> #include <stdlib.h> #include <stdbool.h> struct Arr { int * pBase; int len; int cnt; }; void init_arr(st

今天开始学Pattern Recognition and Machine Learning (PRML),章节5.2-5.3,Neural Networks神经网络训练(BP算法)

转载请注明出处:Bin的专栏,http://blog.csdn.net/xbinworld 这一篇是整个第五章的精华了,会重点介绍一下Neural Networks的训练方法--反向传播算法(backpropagation,BP),这个算法提出到现在近30年时间都没什么变化,可谓极其经典.也是deep learning的基石之一.还是老样子,下文基本是阅读笔记(句子翻译+自己理解),把书里的内容梳理一遍,也不为什么目的,记下来以后自己可以翻阅用. 5.2 Network Training 我们可

强算KMeans聚类算法演示器

这些天做C#实验以及这个KMeans算法演示器,学了一下openGL,感觉有待加强. //Point.h /* Point 结构体定义及实现 结构体重载了2个运算符: 1.== //推断两个Point的坐标值是否相等 2.<< //用于显示(以友元函数的方式重载) */ #ifndef Point_h_ #define Point_h_ #include <iostream> #include <string> #include <iomanip> usin

BP算法

1986年Rumelhart和McCelland在<并行分布式处理>中提出了BP算法,即非线性连续变换函数的多层感知器网络误差反向传播算法. 该算法的思想是:学习过程分为信号的正向传播与误差的反向传播两个过程. 正向传播时,输入样本从输入层传入,经各隐含层逐层处理后传向输出层,若输出层的实际输出与期望输出不符,则转入误差反向传播阶段. 误差反传是将输出误差以某种形式通过隐含层向输入层逐层反传,并将误差分摊给各层的所有单元,从而获得各层的误差信号,此信号即作为修正各单元权值的依据. 上述两个过程

误差逆传播(error BackPropagation, BP)算法推导及向量化表示

1.前言 看完讲卷积神经网络基础讲得非常好的cs231后总感觉不过瘾,主要原因在于虽然知道了卷积神经网络的计算过程和基本结构,但还是无法透彻理解卷积神经网络的学习过程.于是找来了进阶的教材Notes on Convolutional Neural Networks,结果刚看到第2章教材对BP算法的回顾就犯难了,不同于之前我学习的对每一个权值分别进行更新的公式推导,通过向量化表示它只用了5个式子就完成了对连接权值更新公式的描述,因此我第一眼看过去对每个向量的内部结构根本不清楚.原因还估计是自己当初

BP算法与公式推导

BP(backpropgationalgorithm ):后向传导算法,顾名思义就是从神经网络的输出(顶层)到输入(底层)进行求解.那么求解什么呢,求解的就是神经网络中的参数的导数,即参数梯度方向,从而就可以使用梯度下降等求解无约束问题(cost function的最值)的方法求得最终的参数.神经网络前向传播的过程比较简单,这里不做讲解(如果不了解,可以参看文献). 1.问题分析 1.1 Cost function 假设我们有一个固定样本集,它包含 m 个样例.我们可以用批量梯度下降法来求解神经

BP算法基本原理推导----《机器学习》笔记

前言 多层网络的训练需要一种强大的学习算法,其中BP(errorBackPropagation)算法就是成功的代表,它是迄今最成功的神经网络学习算法. 今天就来探讨下BP算法的原理以及公式推导吧. 神经网络 先来简单介绍一下神经网络,引入基本的计算公式,方便后面推导使用 图1 神经网络神经元模型 图1就是一个标准的M-P神经元模型.

stanford coursera 机器学习编程作业 exercise4--使用BP算法训练神经网络以识别阿拉伯数字(0-9)

在这篇文章中,会实现一个BP(backpropagation)算法,并将之应用到手写的阿拉伯数字(0-9)的自动识别上. 训练数据集(training set)如下:一共有5000个训练实例(training instance),每个训练实例是一个400维特征的列向量(20*20 pixel image).用 X 矩阵表示整个训练集,则 X 是一个 5000*400 (5000行 400列)的矩阵 另外,还有一个5000*1的列向量 y ,用来标记训练数据集的结果.比如,第一个训练实例对应的输出