Introduction to neural network
不能自欺欺人. 实干兴邦,空谈误国。
-----------------------------------------------------------------------------------------------------------------------------------------------
题外话:
Matlab的神经网络工具箱使得神经网络得到大力的推广,得到很多其它的人关注与研究使用. 然而,非常多人也就是简单调用matlab的API而已。别人高手已经帮你写好了全部相关的处理函数。自己调用一下API就是了。有时候甚至都不用写代码。直接打开matlab工具箱就能用.
这极大的便利了做理论研究的人,可是,还有一方面。这么好的封装使得用户傻瓜化。学习者不再探究神经网络的实现,进而依赖于matlab。这也往往把一个人给毁了....
-------------------------------------------------------------------------------------------------------------------------------------------------
建议先跳到最后瞥一眼实现代码,然后再从这里看 :)
一 . 什么是神经网络?
Humans and other animals process information with neural networks. These are formed from trillions of neurons (nerve cells) exchanging brief electrical pulses called action potentials. Computer algorithms that
mimic these biological structures are formally called artificial neural networks to distinguish them from the squishy things inside of animals. However, most scientists and engineers are not this formal and use the term neural network to
include both biological and nonbiological systems.
This neural network is formed in three layers, called the input layer,hidden layer, andoutput layer. Each
layer consists of one or more nodes, represented in this diagram by the small circles. The lines between the nodes indicate the flow of information from one node to the next. In this particular type of neural network, the information flows only from the input
to the output (that is, from left-to-right). Other types of neural networks have more intricate connections, such as feedback paths.
神经网络事实上就是 “非线性方程组基于随机概率的暴力近似求解”,这是我的理解。至于为什么,单扯理论是讲不明确的,来个实际的样例会好非常多.
二. 神经网络的简单实现——检測数字
这里又要感谢开明的washington university。
我的程序依据他们的demo改变而来,原来他们的程序測试输入单一,我对程序进行了简单的改进,希望能更好的阐述nerual network :)
这里demo要求实现的功能就是同意用户输入这种10*5的点阵(大写字母O以及空格组成),图中样例为数字0 和数字1数2的点整表示形式,计算机原本是不会识别这种点阵的,就像计算机原本无法识别人脸。这些点阵的排布差点儿不能用数学方程式表示出来。是典型的非线性模型.
首先我们对我们的对象——神经网络,进行数据抽象.
须要说明的是,这里不过介绍神经网络的实现。使用的是最最简单的神经网络,没有中间层,
仅仅有输入(InputLayyer)输出(OutputLayyer)两层。 INT REAL 各自是int 和double类型的数据(不得不说对于C的抽象封装来说typedef简直棒呆~)
typedef struct { /* A LAYER OF A NET: */ INT Units; /* - number of units in this layer */ REAL* Activation; /* - activation of ith unit */ INT* Output; /* - output of ith unit */ REAL* Error; /* - error term of ith unit */ REAL** Weight; /* - connection weights to ith unit */ } LAYER; typedef struct { /* A NET: */ LAYER* InputLayer; /* - input layer */ LAYER* OutputLayer; /* - output layer */ REAL Eta; /* - learning rate */ REAL Error; /* - total net error */ REAL Epsilon; /* - net error to terminate training */ } NET;
这个实现的层和网络的各个子成员后面还会又介绍.
这里是基于随机概率产生某个系数的函数实现,例如说RandomEqualINT(),这个函数就是随机的产生一个位于[ Low,High ]之间的整数。而RandEqualREAL则是产生一个实数(double类型嘛~)
void InitializeRandoms() { srand(4711); } INT RandomEqualINT(INT Low, INT High) { return rand() % (High-Low+1) + Low; } REAL RandomEqualREAL(REAL Low, REAL High) { return ((REAL) rand() / RAND_MAX) * (High-Low) + Low; }
Pattern数组定义了NUM_DATA ( 10 )个“二维点阵”。用来表示象形的阿拉伯数字1~9
宏定义X Y各自是这里点阵的列数和行数,N是一个点阵内全部的点的数目
网络是X×Y(35)个元素输入,M(10)个元素输出.
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvY2lubXloZWFydA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" >
InitializeApplication 实现了把二维向量转化为一维向量的功能
(就是把二维的点阵转化到一维去,例如说
0 1 0
0 1 0
0 1 0
这个点阵转换成 0 1 0 0 1 0 0 1 0
)
假设Pattern 点阵里面是储存的字符‘O’ 就把全局变量Input数组相应的位设置成HI (1) 否则设置成LO(-1)
这里有NUM_DATA个点阵,于是转化成了十个一维向量。组合起来就是全局变量Input
void InitializeApplication(NET* Net) { INT n,i,j; Net->Eta = 0.001; Net->Epsilon = 0.0001; for (n=0; n<NUM_DATA; n++) { for (i=0; i<Y; i++) { for (j=0; j<X; j++) { Input[n][i*X+j] = (Pattern[n][i][j] == 'O') ? HI : LO; } } } f = fopen("ADALINE.txt", "w"); }
RandomWeights用于初始化输出层的权重. 取-0.5到0.5之间是随机数,这里就是随机初始化而已. Don‘t panic :)
void RandomWeights(NET* Net) { INT i,j; for (i=1; i<=Net->OutputLayer->Units; i++) { for (j=0; j<=Net->InputLayer->Units; j++) { Net->OutputLayer->Weight[i][j] = RandomEqualREAL(-0.5, 0.5); } } }
介绍了部分函数。接下来介绍main函数的基本的部分。借此进而介绍其它没有介绍的函数. 遇到问题,解决这个问题.
能够看到之里有个BOOL类型的变量Stop,每次循环開始都会被赋值为TRUE(1).默认是要while停止循环的
while(NOT Stop)。NOT 是 。的宏定义. main函数内部的局部变量Error用于记录全网络的误差
一旦网络误差Net.Error小于网络求解精度要求Net.Epsilon,那么就停止训练。否则一直训练.
第一个for循环调用了SimulateNet(),注意这里 BOOL变量Traning 和Protocoling被置为FALSE(0)
void SimulateNet(NET* Net, INT* Input, INT* Target, BOOL Training, BOOL Protocoling) { INT Output[M]; SetInput(Net, Input, Protocoling); PropagateNet(Net); GetOutput(Net, Output, Protocoling); ComputeOutputError(Net, Target); if (Training) AdjustWeights(Net); }
调用了三个函数
SetInput(), 还是要记得,这里我们在第一个for循环的时候,Protocoling是FALSE.
这里把当前输入向量Input_current_vector,作为当前网络的输入层的输出。
void SetInput(NET* Net, INT* Input_current_vector, BOOL Protocoling) { INT i; for (i=1; i<=Net->InputLayer->Units; i++) { Net->InputLayer->Output[i] = Input_current_vector[i-1]; } if (Protocoling) { WriteInput(Net, Input_current_vector); } }
Net->OutputLayer->Weight[i][j] 这是随机初始化得来的权重(后面会依据偏差进行不断的修正。直达达到误差精度要求).
Propagate(), 对把每一个输入进行不同权重的组合相加,然后得到一个值Sum。大于0,取HI,小于0,取LO赋值给输出层的输出.
void PropagateNet(NET* Net) { INT i,j; REAL Sum; for (i=1; i<=Net->OutputLayer->Units; i++) { Sum = 0; for (j=0; j<=Net->InputLayer->Units; j++) { Sum += Net->OutputLayer->Weight[i][j] * Net->InputLayer->Output[j]; } Net->OutputLayer->Activation[i] = Sum; if (Sum >= 0) Net->OutputLayer->Output[i] = HI; else Net->OutputLayer->Output[i] = LO; } }
把当前网络输出Net->OutputLayer->Output输出到输出向量Output_current_vector其中.
GetOutput().
void GetOutput(NET* Net, INT* Output_current_vector, BOOL Protocoling) { INT i; for (i=1; i<=Net->OutputLayer->Units; i++) { Output_current_vector[i-1] = Net->OutputLayer->Output[i]; } if (Protocoling) { WriteOutput(Net, Output_current_vector); } }
ComputeOutputError()把输出层的输出和Target(训练目标,这里是矩阵Output)做差值,作为Err和输出层的Error[i]
网络的Error被更新为 原有误差 + 0.5×sqr(Err)。
void ComputeOutputError(NET* Net, INT* Target) { INT i; REAL Err; Net->Error = 0; for (i=1; i<=Net->OutputLayer->Units; i++) { Err = Target[i-1] - Net->OutputLayer->Activation[i]; Net->OutputLayer->Error[i] = Err; Net->Error += 0.5 * sqr(Err); } }
接着main函数内部第二层for循环对网络进行训练,注意这里SImulateNet函数的第四个參数Traning变成了TURE
for (m=0; m<10*NUM_DATA; m++) { n = RandomEqualINT(0, NUM_DATA-1); SimulateNet(&Net, Input[n], Output[n], TRUE, FALSE); }
于是会调用函数AdjustWeights(),这才是反馈的过程 调整输出层的权重,详细方法是当前权重加上输出偏差Error[i]*输入层的输出Output[i]×精度Net->Eta.
可能会又疑问,为什么这么干?
void AdjustWeights(NET* Net) { INT i,j; INT Out; REAL Err; for (i=1; i<=Net->OutputLayer->Units; i++) { for (j=0; j<=Net->InputLayer->Units; j++) { Out = Net->InputLayer->Output[j]; Err = Net->OutputLayer->Error[i]; Net->OutputLayer->Weight[i][j] += Net->Eta * Err * Out; } } }
回忆输出层的Error[i]假设大于0标志着。当前输出小于期望输出,假设小于0。反之。
就说明要加大该输入点j相应的输出点i的权重。以缩小实际输出和期望的差值(误差从大于0到趋近于0).
同理,假设当前输出层的输出比期望输出要大,那么输出层的Error[i]就是个负数,以减小相应节点间的权重,缩小误差.(误差从小于0趋向于0)
AHa~框架就是这样。剩下的部分都是測试,基本上把主要重要的代码都讲清楚了.接下来我们能够开心的玩測试了.
我有益对输入造成一定的误差(点阵数字都是有标准定义的Parttern数组)。得到的检測结果,事实上感觉效果还蛮好的。嘿嘿,具有一定的容错性。
毕竟都没有中间层。神经网络的鲁棒性还没有发挥出来,神经网络的中间层越多,鲁棒性就越强,同一时候计算量会增大,时间消耗加大.
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvY2lubXloZWFydA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" >
以下是我经过改动过的代码:
/********************************************************* Code writer : EOF Code date : 2014.10.24 Code file : adaline_netwrok.c e-mail: : [email protected] Code Description: This program is a demo for beginner to understand nerual network. If there is something wrong with my code, please touch me by e-mail. **********************************************************/ #include <stdlib.h> #include <stdio.h> typedef int BOOL; typedef char CHAR; typedef int INT; typedef double REAL; #define FALSE 0 #define TRUE 1 #define NOT ! #define AND && #define OR || #define MIN(x,y) ((x)<(y) ? (x) : (y)) #define MAX(x,y) ((x)>(y) ? (x) : (y)) #define LO -1 #define HI +1 #define BIAS 1 #define sqr(x) ((x)*(x)) typedef struct { /* A LAYER OF A NET: */ INT Units; /* - number of units in this layer */ REAL* Activation; /* - activation of ith unit */ INT* Output; /* - output of ith unit */ REAL* Error; /* - error term of ith unit */ REAL** Weight; /* - connection weights to ith unit */ } LAYER; typedef struct { /* A NET: */ LAYER* InputLayer; /* - input layer */ LAYER* OutputLayer; /* - output layer */ REAL Eta; /* - learning rate */ REAL Error; /* - total net error */ REAL Epsilon; /* - net error to terminate training */ } NET; /****************************************************************************** R A N D O M S D R A W N F R O M D I S T R I B U T I O N S ******************************************************************************/ void InitializeRandoms() { srand(4711); } INT RandomEqualINT(INT Low, INT High) { return rand() % (High-Low+1) + Low; } REAL RandomEqualREAL(REAL Low, REAL High) { return ((REAL) rand() / RAND_MAX) * (High-Low) + Low; } /****************************************************************************** A P P L I C A T I O N - S P E C I F I C C O D E ******************************************************************************/ #define NUM_DATA 10 #define X 5 #define Y 7 #define N (X * Y) #define M 10 CHAR Pattern[NUM_DATA][Y][X] = { { " OOO ", "O O", "O O", "O O", "O O", "O O", " OOO " }, { " O ", " OO ", "O O ", " O ", " O ", " O ", " O " }, { " OOO ", "O O", " O", " O ", " O ", " O ", "OOOOO" }, { " OOO ", "O O", " O", " OOO ", " O", "O O", " OOO " }, { " O ", " OO ", " O O ", "O O ", "OOOOO", " O ", " O " }, { "OOOOO", "O ", "O ", "OOOO ", " O", "O O", " OOO " }, { " OOO ", "O O", "O ", "OOOO ", "O O", "O O", " OOO " }, { "OOOOO", " O", " O", " O ", " O ", " O ", "O " }, { " OOO ", "O O", "O O", " OOO ", "O O", "O O", " OOO " }, { " OOO ", "O O", "O O", " OOOO", " O", "O O", " OOO " } }; CHAR Pattern_for_testing[NUM_DATA][Y][X] = { { " OOO ", " ", " O", " O", " O", " O", " O " }, { " O ", " O ", " O ", " O ", " O ", " O ", " O "}, { " OOO ", "O O", " O", " O ", " O ", " O ", "OOOOO" }, { " OOO ", "O O", "O O", "OOOOO", " O", "O O", " OOO " }, { " O ", " OO ", " O O ", "O O ", "OOOOO", " ", " " }, { "OOOOO", "O ", "O ", "OOOOO", "O O", "O O", "OOOOO" }, { " OOO ", "O O", "O ", "OOOO ", "O O", "O O", " OOO " }, { "OOOOO", " O", " O", " O ", " O ", " O ", "O " }, { " OOO ", "O O", "O O", " ", "O O", "O O", " OOO " }, { " OOO ", "O O", "O O", "O O", "O O", "O O", " OOO " } }; INT Input [NUM_DATA][N]; INT Input_for_testing [NUM_DATA][N];//EOF added INT Output[NUM_DATA][M] = { {HI, LO, LO, LO, LO, LO, LO, LO, LO, LO}, {LO, HI, LO, LO, LO, LO, LO, LO, LO, LO}, {LO, LO, HI, LO, LO, LO, LO, LO, LO, LO}, {LO, LO, LO, HI, LO, LO, LO, LO, LO, LO}, {LO, LO, LO, LO, HI, LO, LO, LO, LO, LO}, {LO, LO, LO, LO, LO, HI, LO, LO, LO, LO}, {LO, LO, LO, LO, LO, LO, HI, LO, LO, LO}, {LO, LO, LO, LO, LO, LO, LO, HI, LO, LO}, {LO, LO, LO, LO, LO, LO, LO, LO, HI, LO}, {LO, LO, LO, LO, LO, LO, LO, LO, LO, HI} }; FILE* f; void InitializeApplication(NET* Net) { INT n,i,j; Net->Eta = 0.001; Net->Epsilon = 0.0001; for (n=0; n<NUM_DATA; n++) { for (i=0; i<Y; i++) { for (j=0; j<X; j++) { Input[n][i*X+j] = (Pattern[n][i][j] == 'O') ? HI : LO; /* ** EOF added */ Input_for_testing[n][i*X+j] = (Pattern_for_testing[n][i][j] == 'O') ? HI : LO; } } } f = fopen("ADALINE.txt", "w"); } void WriteInput(NET* Net, INT* Input_current_vector) { INT i; for (i=0; i<N; i++) { if (i%X == 0) { fprintf(f, "\n"); } fprintf(f, "%c", (Input_current_vector[i] == HI) ? 'O' : ' '); } fprintf(f, " -> "); } void WriteOutput(NET* Net, INT* Output_current_vector) { INT i; INT Count, Index; Count = 0; for (i=0; i<M; i++) { if (Output_current_vector[i] == HI) { Count++; Index = i; } } if (Count == 1) fprintf(f, "%i\n", Index); else fprintf(f, "%s\n", "invalid"); } void FinalizeApplication(NET* Net) { fclose(f); } /****************************************************************************** I N I T I A L I Z A T I O N ******************************************************************************/ void GenerateNetwork(NET* Net) { INT i; Net->InputLayer = (LAYER*) malloc(sizeof(LAYER)); Net->OutputLayer = (LAYER*) malloc(sizeof(LAYER)); Net->InputLayer->Units = N; Net->InputLayer->Output = (INT*) calloc(N+1, sizeof(INT)); Net->InputLayer->Output[0] = BIAS; Net->OutputLayer->Units = M; Net->OutputLayer->Activation = (REAL*) calloc(M+1, sizeof(REAL)); Net->OutputLayer->Output = (INT*) calloc(M+1, sizeof(INT)); Net->OutputLayer->Error = (REAL*) calloc(M+1, sizeof(REAL)); Net->OutputLayer->Weight = (REAL**) calloc(M+1, sizeof(REAL*)); for (i=1; i<=M; i++) { Net->OutputLayer->Weight[i] = (REAL*) calloc(N+1, sizeof(REAL)); } Net->Eta = 0.1; Net->Epsilon = 0.01; } void RandomWeights(NET* Net) { INT i,j; for (i=1; i<=Net->OutputLayer->Units; i++) { for (j=0; j<=Net->InputLayer->Units; j++) { Net->OutputLayer->Weight[i][j] = RandomEqualREAL(-0.5, 0.5); } } } void SetInput(NET* Net, INT* Input_current_vector, BOOL Protocoling) { INT i; for (i=1; i<=Net->InputLayer->Units; i++) { Net->InputLayer->Output[i] = Input_current_vector[i-1]; } if (Protocoling) { WriteInput(Net, Input_current_vector); } } void GetOutput(NET* Net, INT* Output_current_vector, BOOL Protocoling) { INT i; for (i=1; i<=Net->OutputLayer->Units; i++) { Output_current_vector[i-1] = Net->OutputLayer->Output[i]; } if (Protocoling) { WriteOutput(Net, Output_current_vector); } } /****************************************************************************** P R O P A G A T I N G S I G N A L S ******************************************************************************/ void PropagateNet(NET* Net) { INT i,j; REAL Sum; for (i=1; i<=Net->OutputLayer->Units; i++) { Sum = 0; for (j=0; j<=Net->InputLayer->Units; j++) { Sum += Net->OutputLayer->Weight[i][j] * Net->InputLayer->Output[j]; } Net->OutputLayer->Activation[i] = Sum; if (Sum >= 0) Net->OutputLayer->Output[i] = HI; else Net->OutputLayer->Output[i] = LO; } } /****************************************************************************** A D J U S T I N G W E I G H T S ******************************************************************************/ void ComputeOutputError(NET* Net, INT* Target) { INT i; REAL Err; Net->Error = 0; for (i=1; i<=Net->OutputLayer->Units; i++) { Err = Target[i-1] - Net->OutputLayer->Activation[i]; Net->OutputLayer->Error[i] = Err; Net->Error += 0.5 * sqr(Err); } } void AdjustWeights(NET* Net) { INT i,j; INT Out; REAL Err; for (i=1; i<=Net->OutputLayer->Units; i++) { for (j=0; j<=Net->InputLayer->Units; j++) { Out = Net->InputLayer->Output[j]; Err = Net->OutputLayer->Error[i]; Net->OutputLayer->Weight[i][j] += Net->Eta * Err * Out; } } } /****************************************************************************** S I M U L A T I N G T H E N E T ******************************************************************************/ void SimulateNet(NET* Net, INT* Input, INT* Target, BOOL Training, BOOL Protocoling) { INT Output[M]; SetInput(Net, Input, Protocoling); PropagateNet(Net); GetOutput(Net, Output, Protocoling); ComputeOutputError(Net, Target); if (Training) AdjustWeights(Net); } /****************************************************************************** M A I N ******************************************************************************/ void main() { NET Net; REAL Error; BOOL Stop; INT n,m; InitializeRandoms(); GenerateNetwork(&Net); RandomWeights(&Net); InitializeApplication(&Net); do { Error = 0; Stop = TRUE; for (n=0; n<NUM_DATA; n++) { SimulateNet(&Net, Input[n], Output[n], FALSE, FALSE); Error = MAX(Error, Net.Error); Stop = Stop AND (Net.Error < Net.Epsilon); } Error = MAX(Error, Net.Epsilon); printf("Training %0.0f%% completed ...\n", (Net.Epsilon / Error) * 100); if (NOT Stop) { for (m=0; m<10*NUM_DATA; m++) { n = RandomEqualINT(0, NUM_DATA-1); SimulateNet(&Net, Input[n], Output[n], TRUE, FALSE); } } } while (NOT Stop); for (n=0; n<NUM_DATA; n++) { SimulateNet(&Net, Input_for_testing[n], Output[n], FALSE, TRUE); } FinalizeApplication(&Net); }
update : 2014.10.27
后话:
神经网络是用来“尝试”搞定眼下数学水平无法搞定的问题,其设计思想“学习”里就隐含着让算法自己主动执行,人类不再过多干预算法的处理过程,就像一台机器。你按下“開始”button即可了。其它的。他自己会去完毕。
而所谓的学习,是一个不断自我修正的过程,不断的调整网络内部的系数(无论怎么样,一旦上了项目,这里的计算量就大的吓人),从而不断的缩小期望输出与实际输出的差值,直到最后满足误差精度要求,即刻停止“学习”。
典型的项目——人脸识别,预測类的问题,都能够用上神经网络。我开blog开头说过。神经网络的实质就是非线性问题基于概率模型的暴力求解.
update: 2015.02.01
@XuGuowei 提到,这里为什么要把初始的參数随机化。
我想这里是为了让训练网络用的平均时间最短。
假设Viewer写过快排(Quick Sort)或者对快排有一定认识的话会发现,快排的时间复杂度是平均时间,而不是和堆排一样的最坏情况O(n lgn). 还有,二叉树,一般的最简单的二叉搜索树(BST, binary search tree),非常可能是不平衡的,这样搜索的时间复杂度可能就不够好,而随机化生成的二叉搜索树的左右子树的高度则会”尽可能的平衡“,从而保证搜索的时候时间复杂度不会非常差,由于是随机的。任意时间复杂度是平均化的时间。
假设这里初始化的值都是0,或者某些极端的或特殊的值。那么往往不利于网络的收敛。
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvY2lubXloZWFydA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" >
版权声明:本文博主原创文章,博客,未经同意不得转载。