论文笔记-Sequence to Sequence Learning with Neural Networks

大体思想和RNN encoder-decoder是一样的,只是用来LSTM来实现。

paper提到三个important point:

1)encoder和decoder的LSTM是两个不同的模型

2)deep LSTM表现比shallow好,选用了4层的LSTM

3)实践中发现将输入句子reverse后再进行训练效果更好。So for example, instead of mapping the sentence a,b,c to the sentence α,β,γ, the LSTM is asked to map c,b,a to α,β,γ, where α, β, γ is the translation of a, b, c. This way, a is in close proximity to α, b is fairly close to β, and so on, a fact that makes it easy for SGD to “establish communication” between the input and the output.

时间: 2024-10-08 12:24:37

论文笔记-Sequence to Sequence Learning with Neural Networks的相关文章

论文笔记 Aggregated Residual Transformations for Deep Neural Networks

这篇文章构建了一个基本"Block",并在此"Block"基础上引入了一个新的维度"cardinality"(字母"C"在图.表中表示这一维度).深度网络的另外两个维度分别为depth(层数).width(width指一个层的channel的数目). 首先我们先了解一个这个"Block"是如何构建的,如下图所示(ResNeXt是这篇论文提出模型的简化表示) 左边是标准残差网络"Block"

论文笔记--AlexNet--ImageNet Classification with Deep Convolutional Neural Networks

Datasets: LabelMe: consists of hundreds of thousands of fully-segmented images ImageNet: consists of over 15 million labeled high-resolution images in over 22000 categories 这篇论文使用的数据集是ImageNet 多余的话: ImageNet包含超过1500 0000张的已标记的高清晰度图片,这些图片大约有22000类.这些图

【论文笔记】Learning Convolutional Neural Networks for Graphs

Learning Convolutional Neural Networks for Graphs 2018-01-17  21:41:57 [Introduction] 这篇 paper 是发表在 ICML 2016 的:http://jmlr.org/proceedings/papers/v48/niepert16.pdf 上图展示了传统 CNN 在 image 上进行卷积操作的工作流程.(a)就是通过滑动窗口的形式,利用3*3 的卷积核在 image 上进行滑动,来感知以某一个像素点为中心

Learning Convolutional Neural Networks for Graphs(网友分析)

paper Learning Convolutional Neural Networks for Graphs论文导读及代码链接(比较细,全面) [论文笔记]Learning Convolutional Neural Networks for Graphs(与上面这篇互为补充) Learning Convolutional Neural Networks for Graphs(上面2个的进一步补充) 原文地址:https://www.cnblogs.com/hugh2006/p/12642135

吴恩达《深度学习》-课后测验-第五门课 序列模型(Sequence Models)-Week 1: Recurrent Neural Networks(第一周测验:循环神经网络)

Week 1 Quiz: Recurrent Neural Networks(第一周测验:循环神经网络) \1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the jth word in the ith training example?( 假设你的训练样本是句子(单词序列),下 面哪个选项指的是第??个训练样本中的第??个词?) [ ]

Machine Learning - VIII. Neural Networks Representation (Week 4)

http://blog.csdn.net/pipisorry/article/details/4397356 机器学习Machine Learning - Andrew NG courses学习笔记 Neural Networks Representation神经网络表示 Non-linear Hypotheses非线性假设 Neurons and the Brain神经元和大脑 Model Representation模型表示 Examples and Intuitions示例和直觉知识 Mu

Machine Learning - IX. Neural Networks Learning (Week 5)

http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 Neural Networks Learning 神经网络学习 Neural Networks are one of the most powerful learning algorithms that we have today. Cost Function代价函数 Note: 对于multi-

论文阅读--Scalable Object Detection using Deep Neural Networks

Scalable Object Detection using Deep Neural Networks 作者: Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov 引用: Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Confere

论文笔记之:Curriculum Learning of Multiple Tasks

Curriculum Learning of Multiple Tasks CVPR 2015 对于多任务的学习(Multi-Task Learning)像是属性识别等任务,之前都是每一个属性训练一个分类器,后来有了深度学习,大家都用共享卷积层的方式来联合的学习(Joint Learning).让网络去学习各个任务或者说各个属性之间潜在的联系,最后对其进行分类,或者逻辑回归.本文对这种做法提出了其诟病,说是:多个task之间的相互关系并不相同,有的有关系 有的关系很弱或者基本不相关等等. 如上图