总结:Different Methods for Weight Initialization in Deep Learning

这里总结了三种权重的初始化方法,前两种比较常见,后一种是最新的。为了表达顺畅(当时写给一个歪果仁看的),用了英文,欢迎补充和指正。

尊重原创,转载请注明:http://blog.csdn.net/tangwei2014

1. Gaussian

Weights are randomly drawn from Gaussian distributions with fixed mean (e.g., 0) and fixed standard deviation (e.g., 0.01).

This is the most common initialization method in deep learning.

2. Xavier

This method proposes to adopt a properly scaled uniform or Gaussian distribution for initialization.

In Caffe (an openframework for deep learning) [2], It initializes the weights in network by drawing them from a distribution with zero mean and a specific variance,

Where W  is the initialization distribution for the neuron in question, and   n_in is the number of neurons feeding into it. The distribution used is typically Gaussian or uniform.

In Glorot & Bengio’s paper [1], itoriginally recommended using

Where n_out is the number of neurons the result is fed to.

Reference:

[1] X. Glorot and Y. Bengio. Understanding the difficulty of training deepfeedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.

[2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S.Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast featureembedding. arXiv:1408.5093, 2014.

3. MSRA

This method is proposed to solve the training of extremely deep rectified models directly from scratch [1].

In this method,weights are initialized with a zero-mean Gaussian distribution whose std is

Where  is the spatial filter size in layer l and
d_l?1 is the number of filters in layer l?1.

Reference:

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, Technical report, arXiv, Feb. 2015

版权声明:本文为博主原创文章,未经博主允许不得转载。

时间: 2024-10-17 20:56:42

总结:Different Methods for Weight Initialization in Deep Learning的相关文章

[CS231n-CNN] Training Neural Networks Part 1 : activation functions, weight initialization, gradient flow, batch normalization | babysitting the learning process, hyperparameter optimization

课程主页:http://cs231n.stanford.edu/ ? Introduction to neural networks -Training Neural Network ______________________________________________________________________________________________________________________________________________________________

Initialization of deep networks

Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-convex optimization algorithm (like stochastic gradient descent) depends on the initial values of the parameters. This post is about choosing initializati

Why are very few schools involved in deep learning research? Why are they still hooked on to Bayesian methods?

Why are very few schools involved in deep learning research? Why are they still hooked on to Bayesian methods? First, this question assumes that every university should have a "deep learning" person.  Deep learning is mostly used in vision (and

视觉中的深度学习方法CVPR 2012 Tutorial Deep Learning Methods for Vision

Deep Learning Methods for Vision CVPR 2012 Tutorial  9:00am-5:30pm, Sunday June 17th, Ballroom D (Full day) Rob Fergus (NYU), Honglak Lee (Michigan), Marc'Aurelio Ranzato (Google) Ruslan Salakhutdinov(Toronto), Graham Taylor(Guelph), Kai Yu(Baidu)  O

Kernel Methods for Deep Learning

目录 引 主要内容 与深度学习的联系 实验 Cho Y, Saul L K. Kernel Methods for Deep Learning[C]. neural information processing systems, 2009: 342-350. @article{cho2009kernel, title={Kernel Methods for Deep Learning}, author={Cho, Youngmin and Saul, Lawrence K}, pages={34

【转载】A Brief Overview of Deep Learning

A Brief Overview of Deep Learning (This is a guest post by Ilya Sutskever on the intuition behind deep learning as well as some very useful practical advice. Many thanks to Ilya for such a heroic effort!) Deep Learning is really popular these days. B

A Brief Overview of Deep Learning

A Brief Overview of Deep Learning (This is a guest post by Ilya Sutskever on the intuition behind deep learning as well as some very useful practical advice. Many thanks to Ilya for such a heroic effort!) Deep Learning is really popular these days. B

Neural Networks and Deep Learning

Neural Networks and Deep Learning This is the first course of the deep learning specialization at Coursera which is moderated by moderated by DeepLearning.ai. The course is taught by Andrew Ng. Introduction to deep learning Be able to explain the maj

[C3] Andrew Ng - Neural Networks and Deep Learning

About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep learning engineers are highly sought after, and mastering deep learning will give you numerous new career opportunities. Deep learning is also a new "s