循环神经网络一段。反向传播最令人激动的应用是应用于rnn的,对于包含序列化输入的文本来说,比如说语言和语音,通常使用rnn来进行处理。rnn每次处理一个序列中的一个元素。保持隐藏层的神经元作为一个状态向量,能够精确的表达这个序列过去元素的历史信息。当我们讲输出的隐藏层看作是不同的step的时候,我们可以将它们看作是在一个深度网络中的不同层次,也就能够应用反向传播去训练rnn。
Rnn对于动态系统非常管用,但是训练是个大问题,因为使用反向传播或者梯度下降在每步汇总都会缩小或者增大,所以经过许多步之后容易梯度爆炸,或者梯度扩散。
不过多亏了结构上的优势和训练上的方法,rnn在预测文本的下一个字符上非常有优势.有时候rnn也应用在负载一些的任务上,比如说读过一个英语句子之后,一个编码网络可以学到这个句子,使得隐藏层中的向量能够很好的代表原来的句子。这个向量可以作为解码网络隐藏状态的初始化,解码网络用于输出第一个法语翻译单词的概率分布。第一个单词确定了之后,能够继续输出第二个单词的概率分布,直到翻译完成。总的来说,这个过程根据一个英语句子,通过概率来产生一个法语翻译。这个简单直接的机器翻译的方法很有竞争力,并且引发了人们的思考:是否可以用规则理解一句话,还是有许多音素共同构成了理解(?似乎是?)
除了将一句话从法语翻译成英语,同样可以将一幅图的理解翻译成一段英文的文字。编码是一个convnet能够将像素图片转化成为一个激活响亮,decoder是一个rnn用来进行机器翻译和语言模型。最近有很多有趣的研究。
Rnn的展开形式可以看作是深度前馈网络,但是每一层之间有相同的权重,尽管它们的主要目的是长期的一个学习,但是理论上和经验上的研究显示很难长时间的存储信息。
所以为了纠正这一点,就发明了lstm,能够记忆本层的信息,所以在每次训练的时候都会把上一次的结果累计下来,然后再和本次的结果相加,这样能够使得神经网络能够记住之前的信息,同时还有一个小开关控制是否这个按钮要清零。
Lstm网络被证明比传统的rnn更有效,特别是在每一个step有多个layer的时候,能够有效的进行机器的从语音到文字的转换。
在过去的一年中,很多人都提出了不同的关于rnn+存储的模型,包括神经图灵机,这个神经网络有一个类似于条带的存储器,rnn能够自己选择去赌或者写入。还有记忆网络,是一个普通的网络,加强了记忆之间的联系。记忆网络在标准的问答数据上有很好的效果,要记住的故事是之后要进行询问的故事。
除了这些简单的记忆之外,神经图灵机和记忆网络同样用来进行和一半的询问或者和符号操作有关的事情。神经图灵机能够学会算法。
Rnn的梯度爆炸或者梯度扩散
77. Hochreiter, S. Untersuchungen zu dynamischen neuronalen Netzen
[in German] Diploma thesis, T.U. Mu?nich (1991).
78. Bengio, Y., Simard, P. & Frasconi, P. Learning long-term
dependencies with gradient descent is difficult. IEEE Trans. Neural Networks 5,
157–166 (1994).
rnn结构上的优势
79. Hochreiter, S. & Schmidhuber, J. Long short-term memory.
Neural Comput. 9, 1735–1780 (1997).
This paper introduced LSTM
recurrent networks, which have become a crucial ingredient in recent advances
with recurrent networks because they are good at learning long-range
dependencies.
80. ElHihi, S. & Bengio, Y. Hierarchical recurrent neural
networks for long-term dependencies. In Proc. Advances in Neural Information
Processing Systems 8
http://papers.nips.cc/paper/1102-hierarchical-recurrent-neural-networks-forlong-term-dependencies
(1995).
rnn训练方法
81. Sutskever, I. Training Recurrent Neural Networks. PhD thesis,
Univ. Toronto (2012).
82. Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of
training recurrent neural networks. In Proc. 30th International Conference on
Machine Learning 1310–1318 (2013).
预测文本字符
83. Sutskever, I., Martens, J. & Hinton, G. E. Generating text
with recurrent neural networks. In Proc. 28th International Conference on
Machine Learning 1017– 1024 (2011).
预测单词
75. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J.
Distributed representations of words and phrases and their compositionality. In
Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013).
文本翻译
17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence
learning with neural networks. In Proc. Advances in Neural Information
Processing Systems 273104–3112 (2014).
This paper showed
state-of-the-art machine translation results with thearchitecture introduced in
ref. 72, with a recurrent network trained to read asentence in one language,
produce a semantic representation of its meaning,and generate a translation in
another language.
72. Cho, K. et al. Learning phrase representations using RNN
encoder-decoder for statistical machine translation. In Proc. Conference on
Empirical Methods in Natural Language Processing 1724–1734 (2014).
76. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine
translation by jointly learning to align and translate. In Proc. International
Conference on Learning Representations http://arxiv.org/abs/1409.0473 (2015).
对于语言的理解
84. Lakoff, G. & Johnson, M. Metaphors We Live By (Univ. Chicago
Press, 2008).
85. Rogers, T. T. & McClelland, J. L. Semantic Cognition: A
Parallel Distributed Processing Approach (MIT Press, 2004).
image->description
86. Xu, K. et al. Show, attend and tell: Neural image caption
generation with visual attention. In Proc. International Conference on Learning
Representations http://arxiv.org/abs/1502.03044 (2015).
rnn难以长时间存储信息
78. Bengio, Y., Simard, P. & Frasconi, P. Learning long-term
dependencies with gradient descent is difficult. IEEE Trans. Neural Networks 5,
157–166 (1994).
lstm have several layers
for each time step
87. Graves, A., Mohamed, A.-R. & Hinton, G. Speech recognition
with deep recurrent neural networks. In Proc. International Conference on
Acoustics, Speech and Signal Processing 6645–6649 (2013).
机器翻译
17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence
learning with neural networks. In Proc. Advances in Neural Information
Processing Systems 27 3104–3112 (2014).
This paper showed state-of-the-art machine translation results with
the architecture introduced in ref. 72, with a recurrent network trained to
read a sentence in one language, produce a semantic representation of its
meaning, and generate a translation in another language.
72. Cho, K. et al. Learning phrase representations using RNN
encoder-decoder for statistical machine translation. In Proc. Conference on
Empirical Methods in Natural Language Processing 1724–1734 (2014).
76. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation
by jointly learning to align and translate. In Proc. International Conference
on Learning Representations http://arxiv.org/abs/1409.0473 (2015).
神经图灵机
88. Graves, A., Wayne, G. &
Danihelka, I. Neural Turing machines. http://arxiv.org/abs/1410.5401 (2014).
记忆网络
89. Weston, J. Chopra, S. & Bordes, A. Memory networks.
http://arxiv.org/abs/1410.3916 (2014).
神经网络学排序
88. Graves, A., Wayne, G. & Danihelka, I. Neural Turing
machines. http://arxiv.org/abs/1410.5401 (2014).
神经网络训练去陈述需要一些推断的事情
90. Weston, J., Bordes, A., Chopra, S. & Mikolov, T. Towards
AI-complete question answering: a set of prerequisite toy tasks.http://arxiv.org/abs/1502.05698(2015).
神经网络看了狮子王中的十五句话,学会了回答frodo在哪儿
89. Weston, J. Chopra, S. & Bordes, A. Memory networks.
http://arxiv.org/abs/1410.3916 (2014).
深度学习的未来
无监督学习其实很有意思,不过它被监督是学习的光芒掩盖了。尽管我们没有关注它,但是我们希望无监督学习能变得更重要。因为人们和东旭的学习很大程度上都是无监督的:我们发现世界的结构是通过观察,而不是通过别人去告诉我们。
希望未来cnn能和rnn使用增强学习去决定看哪里。这个系统能够组合深度学习和增强学习的结合还在婴儿时期,但是他们已经有了一个应用在分类的视觉系统,还能玩游戏。
自然语言处理是另一个cnn能大展宏图的地方,我们用rnn能够理解整篇文档或者字段。我们希望以后rnn能够学会使用策略选择性的每次看一部分(?没太明白)
最终,人工智能会组合多种不同的学习表达。不过最终肯定要用一个新的方法来代替基于规则的方法来操作这么多大向量。
非监督学习
91. Hinton, G. E., Dayan, P., Frey, B. J. & Neal, R. M. The
wake-sleep algorithm for unsupervised neural networks. Science 268, 1558–1161
(1995).
92. Salakhutdinov, R. & Hinton, G. Deep Boltzmann machines. In
Proc. International Conference on Artificial Intelligence and Statistics
448–455 (2009).
93. Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A.
Extracting and composing robust features with denoising autoencoders. In Proc.
25th International Conference on Machine Learning 1096–1103 (2008).
94. Kavukcuoglu, K. et al. Learning convolutional feature
hierarchies for visual recognition. In Proc. Advances in Neural Information
Processing Systems 23 1090–1098 (2010).
95. Gregor, K. & LeCun, Y. Learning fast approximations of
sparse coding. In Proc. International Conference on Machine Learning 399–406
(2010).
96. Ranzato, M., Mnih, V., Susskind, J. M. & Hinton, G. E.
Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine
Intell. 35, 2206–2222(2013).
97. Bengio, Y., Thibodeau-Laufer, E., Alain, G. & Yosinski, J.
Deep generative stochastic networks trainable by backprop. In Proc. 31st
International Conference on Machine Learning 226–234 (2014).
98. Kingma, D., Rezende, D., Mohamed, S. & Welling, M.
Semi-supervised learning with deep generative models. In Proc. Advances in
Neural Information Processing Systems 27 3581–3589 (2014).
cnn+rnn使用增强学习进行视觉分类
99. Ba, J., Mnih, V. & Kavukcuoglu, K. Multiple object
recognition with visual attention. In Proc. International Conference on
Learning Representations。http://arxiv.org/abs/1412.7755 (2014).
cnn+rnn使用增强学习玩游戏
100. Mnih, V. et al. Human-level control through deep reinforcement
learning. Nature518, 529–533 (2015).
rnn learn strategies for
selectively attending to one part at a time
76. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine
translation by jointly learning to align and translate. In Proc. International
Conference on Learning Representations http://arxiv.org/abs/1409.0473 (2015).
86. Xu, K. et al. Show, attend and tell: Neural image caption
generation with visual attention. In Proc. International Conference on Learning
Representations http://arxiv.org/abs/1502.03044 (2015).
操作多种结果的大向量
101. Bottou, L. From machine learning to machine reasoning. Mach.
Learn. 94,133–149 (2014).
rnn关注图片特定位置
102. Vinyals, O., Toshev, A., Bengio, S. & Erhan, D. Show and
tell: a neural image caption generator. In Proc. International Conference on
Machine Learning http://arxiv.org/abs/1502.03044 (2014).
t-SNE
103. van der Maaten, L. & Hinton, G. E. Visualizing data using
t-SNE. J. Mach. Learn.Research 9, 2579–2605 (2008).