今天,我本来想膜一下,所以找到了上古时期发表再nature上的反向传播的论文,但是没看下去。。。所以,翻出来了15年发表在nature上的deep learning,相当于一个review,来阅读一下,而且感觉引文会比较重要,所以这篇中枢值较高的文献拿来学一学。
相关资料:
英文原文:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.894&rep=rep1&type=pdf
中文翻译:
http://www.csdn.net/article/2015-06-01/2824811
http://www.csdn.net/article/2015-06-02/2824825
可视化资料:
http://colah.github.io/
Abstract讲深度学习近些年在各领域取得了非常好的效果。
第一段其实是在讲一个大框架,ml的传统方法是,如果为了去做分类等等的人物,需要自己去提特征,然后进行后续的任务,但是这种方法需要很多专业的知识,不容易再工程上上手。于是有个representation learning这个领域,就是输入数据之后,学习到一些为了目标容易区分的特征,或者说讲原始数据换一个表达方式,使得方便后面的分类啊等等的处理。Deep learning这个工具就很牛逼,我虽然什么都不知道,但是我还是能够从不同的层次抽象出来不同的特征,从而进行学习。现在呢,已经广泛应用再各个领域中了。
下面是一些近期文献:
图像识别
1. Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet
classification with deepconvolutional neural networks. In Proc. Advances in
Neural InformationProcessing Systems 25 1090–1098 (2012).This report was a
breakthrough that used convolutional nets to almost halvethe error rate for
object recognition, and precipitated the rapid adoption ofdeep learning by the
computer vision community.
2. Farabet, C., Couprie, C., Najman, L. & LeCun, Y. Learning
hierarchical features forscene labeling. IEEE Trans. Pattern Anal. Mach.
Intell. 35, 1915–1929 (2013).
3. Tompson, J., Jain, A., LeCun, Y. & Bregler, C. Joint training
of a convolutionalnetwork and a graphical model for human pose estimation. In
Proc. Advances inNeural Information Processing Systems 27 1799–1807 (2014).
4. Szegedy, C. et al. Going deeper with convolutions. Preprint at
http://arxiv.org/abs/1409.4842 (2014).
语音识别
5. Mikolov, T., Deoras, A., Povey, D., Burget, L. & Cernocky, J.
Strategies for traininglarge scale neural network language models. In Proc.
Automatic SpeechRecognition and Understanding 196–201 (2011).
6. Hinton, G. et al. Deep neural networks for acoustic modeling in
speechrecognition. IEEE Signal Processing Magazine 29, 82–97 (2012).This joint
paper from the major speech recognition laboratories, summarizingthe
breakthrough achieved with deep learning on the task of phoneticclassification
for automatic speech recognition, was the first major industrialapplication of
deep learning.
7. Sainath, T., Mohamed, A.-R., Kingsbury, B. & Ramabhadran, B.
Deepconvolutional neural networks for LVCSR. In Proc. Acoustics, Speech and SignalProcessing
8614–8618 (2013).
药物分子
8. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V.
Deep neural nets as amethod for quantitative structure-activity relationships.
J. Chem. Inf. Model. 55,263–274 (2015).
粒子加速器的数据
9. Ciodaro, T., Deva, D., de Seixas, J. & Damazio, D. Online
particle detection withneural networks based on topological calorimetry
information. J. Phys. Conf.Series 368, 012030 (2012).
10. Kaggle. Higgs boson machine learning challenge. Kaggle
https://www.kaggle.com/c/higgs-boson (2014).
重构大脑回路
11. Helmstaedter, M. et al. Connectomic reconstruction of the inner
plexiform layerin the mouse retina. Nature 500, 168–174 (2013).
基因的疾病与表达
12. Leung, M. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep
learning of the tissueregulatedsplicing code. Bioinformatics 30, i121–i129
(2014).
13. Xiong, H. Y. et al. The human splicing code reveals new insights
into the geneticdeterminants of disease. Science 347, 6218 (2015).
自然语言理解
14. Collobert, R., et al. Natural language processing (almost) from
scratch. J. Mach.Learn. Res. 12, 2493–2537 (2011).
问答系统
15. Bordes, A., Chopra, S. & Weston, J. Question answering with
subgraph
embeddings. In Proc. Empirical Methods in Natural Language
Processing http://arxiv.org/abs/1406.3676v3 (2014).
机器翻译
16. Jean, S., Cho, K., Memisevic, R. & Bengio, Y. On using very
large targetvocabulary for neural machine translation. In Proc. ACL-IJCNLP
http://arxiv.org/abs/1412.2007 (2015).
17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence
learning with neural networks. In Proc. Advances in Neural Information
Processing Systems 273104–3112 (2014).
This paper showed
state-of-the-art machine translation results with thearchitecture introduced in
ref. 72, with a recurrent network trained to read asentence in one language,
produce a semantic representation of its meaning,and generate a translation in
another language.
监督式学习一段表示,以往的提取特征线性分类或者浅层非线性分类的方法都效果不太好,深层次的非线性能够提取不变的特征也能够从背景中区分出主要的内容。
比较费力的训练方式
18. Bottou, L. & Bousquet, O. The tradeoffs of large scale
learning. In Proc. Advancesin Neural Information Processing Systems 20 161–168
(2007).
空间的一半的区域的分类
19. Duda, R. O. & Hart, P. E. Pattern Classification and Scene
Analysis (Wiley, 1973).
核方法
20. Sch?lkopf, B. & Smola, A. Learning with Kernels (MIT Press,
2002).
高斯核
21. Bengio, Y., Delalleau, O. & Le Roux, N. The curse of highly
variable functions for local kernel machines. In Proc. Advances in Neural
Information Processing Systems 18 107–114 (2005).
多层结构的反向传播一段讲了通过反向传播算法能够训练网络,但是在就是年代的时候,人们因为认为很少的先验知识推断出有用的特征是在扯淡,而且认为容易陷入局部最优解,所以神经网络逐渐受到冷落。但是大数据使得局部最有很少,由于初始情况不同,最后仅有很少的差异。不过本世纪初,深度网络重燃战火,是因为CIFAR采用无监督学习到了特征去初始化网络,然后采用bp去fine-fune,效果非常好,尤其是在手写数字识别和行人检测的应用上。所以当时的训练如果有大量label的数据,那就训吧,但是如果label的数据比较少,还是建议先用没有label 的数据pre-training一下。卷积神经网络近些年来也逐渐兴起,尤其在cv方面。
早年的模式识别
22. Selfridge, O. G. Pandemonium: a paradigm for learning in
mechanisation of thought processes. In Proc. Symposium on Mechanisation of
Thought Processes 513–526 (1958).
23. Rosenblatt, F. The Perceptron — A Perceiving and Recognizing
Automaton. Tech.Rep. 85-460-1 (Cornell Aeronautical Laboratory, 1957).
八九年代通过简单的随即梯度下降训练神经网络
24. Werbos, P. Beyond Regression: New Tools for Prediction and
Analysis in the Behavioral Sciences. PhD thesis, Harvard Univ. (1974).
25. Parker, D. B. Learning Logic Report TR–47 (MIT Press, 1985).
26. LeCun, Y. Une procédure d’apprentissage pour Réseau à seuil
assymétrique in Cognitiva 85: a la Frontière de l’Intelligence Artificielle,
des Sciences de la Connaissance et des Neurosciences [in French] 599–604
(1985).
27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning
representations by back-propagating errors. Nature 323, 533–536 (1986).
使用ReLU从而避免unsupervised
pre-training
28. Glorot, X., Bordes, A. & Bengio. Y. Deep sparse rectifier
neural networks. In Proc.14th International Conference on Artificial Intelligence
and Statistics 315–323(2011).
This paper showed that
supervised training of very deep neural networks is much faster if the hidden
layers are composed of ReLU.
哪里有什么局部最优,倒是有一些鞍点
29. Dauphin, Y. et al. Identifying and attacking the saddle point
problem in highdimensional non-convex optimization. In Proc. Advances in Neural
Information Processing Systems 27 2933–2941 (2014).
30. Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. &
LeCun, Y. The loss surface of multilayer networks. In Proc. Conference on AI
and Statistics http://arxiv.org/abs/1412.0233 (2014).
深度网络重燃战火
31. Hinton, G. E. What kind of graphical model is the brain? In
Proc. 19th International Joint Conference on Artificial intelligence
1765–1775 (2005).
32. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning
algorithm for deep belief nets. Neural Comp. 18, 1527–1554 (2006).
This paper introduced a
novel and effective way of training very deep neural networks by pre-training
one hidden layer at a time using the unsupervised learning procedure for
restricted Boltzmann machines.
33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H.
Greedy layer-wise training of deep networks. In Proc. Advances in Neural
Information Processing Systems 19 153–160 (2006).
This report demonstrated
that the unsupervised pre-training method introduced in ref. 32 significantly
improves performance on test data and generalizes the method to other
unsupervised representation-learning techniques, such as auto-encoders.
34. Ranzato, M., Poultney, C., Chopra, S. & LeCun, Y. Efficient
learning of sparse representations with an energy-based model. In Proc.
Advances in Neural Information Processing Systems 19 1137–1144 (2006).
无监督初始化,bp fine-tune
33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H.
Greedy layer-wise trainingof deep networks. In Proc. Advances in Neural
Information Processing Systems 19 153–160 (2006).
This report demonstrated
that the unsupervised pre-training method introduced in ref. 32 significantly
improves performance on test data and generalizes the method to other
unsupervised representation-learning techniques, such as auto-encoders.
34. Ranzato, M., Poultney, C., Chopra, S. & LeCun, Y. Efficient
learning of sparse representations with an energy-based model. In Proc.
Advances in Neural Information Processing Systems 19 1137–1144 (2006).
35. Hinton, G. E. & Salakhutdinov, R. Reducing the
dimensionality of data with neural networks. Science 313, 504–507 (2006).
小数据上采用pre-training + fine-tune进行手写数字识别和行人检测
36. Sermanet, P., Kavukcuoglu, K., Chintala, S. & LeCun, Y.
Pedestrian detection with unsupervised multi-stage feature learning. In Proc.
International Conference on Computer Vision and Pattern Recognition
http://arxiv.org/abs/1212.0142 (2013).
采用gpu进行训练
37. Raina, R., Madhavan, A. & Ng, A. Y. Large-scale deep
unsupervised learning using graphics processors. In Proc. 26th Annual
International Conference on Machine Learning 873–880 (2009).
在语音识别上取得了重大的突破
小数据38
大数据39
38. Mohamed, A.-R., Dahl, G. E. & Hinton, G. Acoustic modeling
using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22
(2012).
39. Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent
pre-trained deep neural networks for large vocabulary speech recognition. IEEE
Trans. Audio Speech Lang. Process. 20, 33–42 (2012).
小数据集上pre-training 防止过拟合
40. Bengio, Y., Courville, A. & Vincent, P. Representation
learning: a review and new perspectives. IEEE Trans. Pattern Anal. Machine
Intell. 35, 1798–1828 (2013).
卷积神经网络
41. LeCun, Y. et al. Handwritten digit recognition with a
back-propagation network. In Proc. Advances in Neural Information Processing
Systems 396–404 (1990).
This is the first paper on
convolutional networks trained by backpropagation for the task of classifying
low-resolution images of handwritten digits.
42. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P.
Gradient-based learning applied to document recognition. Proc. IEEE 86,
2278–2324 (1998).
This overview paper on the
principles of end-to-end training of modular systems such as deep neural
networks using gradient-based optimization showed how neural networks (and in
particular convolutional nets) can be combined with search or inference
mechanisms to model complex outputs that are interdependent, such as sequences
of characters associated with the content of a document.
卷积神经网络一段,还有一些经典的层,比如说卷积层或者是池化层。还有一些经典的特征。
不过人们总说卷积的局部链接是因为一个局部的特征可能也分布在别的地方,但是我感觉这其实和概率更有关系,我看到的其实是在这个模式下的某个概率的分布。
言归正传,总的架构和视觉上LGN-V1-V2-V4-IT的整体架构很相似。当一只猴子和convnet网络面对着一张图片的时候,convnet在一半的随即抽样的某区域神经元和猴子很相似?(大概这么翻译)convents发源于神经认知机,但是架构上虽然有一些相似,但是神经认知机没有类似于反向传播似的端到端的监督式的学习算法。一个一维的convnet可以较多延迟神经网络,可以用来识别音素和基本的词语。
回溯1990年代,有很多对于time-delay
neural networks(1d convent)的应用,比如说语音识别和文档阅读上。文档阅读系统使用convnet训练一个概率模型,能够实现语言的约束到某一个范围。到了90年代后期,这个系统已经识别了超过10%的支票,基于convnet的光学媳妇识别和手写数字识别被微软研究。在90年代初期,convnet用在了在自然图片上进行检测,比如说面部和手部的检测,以及面部识别。
视觉神经元启发卷积和池化层
43. Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular
interaction, and functional architecture in the cat’s visual cortex. J.
Physiol. 160, 106–154 (1962).
44. Felleman, D. J. & Essen, D. C. V. Distributed hierarchical
processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).
一个研究关于convnet和猴子面对同一个神经元在高层次的表现
45. Cadieu, C. F. et al. Deep neural networks rival the
representation of primate it cortex for core visual object recognition. PLoS
Comp. Biol. 10, e1003963 (2014).
convent和神经认知机的关系
46. Fukushima, K. & Miyake, S. Neocognitron: a new algorithm for
pattern recognition tolerant of deformations and shifts in position. Pattern
Recognition15, 455–469 (1982).
一维的convnet (time-delay neural net)用来识别音素和简单词语
47. Waibel, A., Hanazawa, T., Hinton, G. E., Shikano, K. & Lang,
K. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics
Speech Signal Process. 37, 328–339 (1989).
48. Bottou, L., Fogelman-Soulié, F., Blanchet, P. & Lienard, J.
Experiments with time delay networks and dynamic time warping for speaker
independent isolated digit recognition. In Proc. EuroSpeech 89 537–540 (1989).
微软进行光学字符识别和手写数字识别
49. Simard, D., Steinkraus, P. Y. & Platt, J. C. Best practices
for convolutional neural networks. In Proc. Document Analysis and Recognition
958–963 (2003).
自然图片中的物体检测
50. Vaillant, R., Monrocq, C. & LeCun, Y. Original approach for
the localisation of objects in images. In Proc. Vision, Image, and Signal
Processing 141, 245–250(1994).
51. Nowlan, S. & Platt, J. in Neural Information Processing
Systems 901–908 (1995).
面部识别
52. Lawrence, S., Giles, C. L., Tsoi, A. C. & Back, A. D. Face
recognition: a convolutional neural-network approach. IEEE Trans. Neural
Networks 8, 98–113(1997).