Gated-Attention Readers for Text Comprehension

Gated-Attention Readers for Text Comprehension

In this paper we study the problem of answering cloze-style questions over short documents. We introduce a new attention mechanism which uses multiplicative interactions between the query embedding and intermediate states of a recurrent neural network reader. This enables the reader to build query-specific representations of tokens in the document which are further used for answer selection. Our model, the Gated-Attention Reader, outperforms all state-of-the-art models on several large-scale benchmark datasets for this task—the CNN & Daily Mail news stories and Children’s Book Test. We also provide a detailed analysis of the performance of our model and several baselines over a subset of questions manually annotated with certain linguistic features. The analysis sheds light on the strengths and weaknesses of several existing models.

其核心就是利用注意力机制,在查询向量的表示和阅读器的中间表示状态引入点乘的交互机制(这种交互式阅读文本和查询向量之间的互动,二者相互配合,加强信息的流动,以正确学习到答案,这和人类阅读是很相近的,我们在阅读寻找特定问题的答案时,也是将注意力放在问题和文档中最相关的部分),如此一来可以使阅读器针对给定的问题将注意力放在文档不同的部分。

核心算法解读:

首先用双向GRU编码问题query,将正向和逆向的最后一个表示作为查询向量的表示q,每层采用一个GRU编码器。

Gated-Attention mechanism by applying an element-wise multiplication between the query embedding qi-1 and the outputs ei-1from the previous layer:

用查询的表示对每一层的每一个文档中的词操作,作者称之为gate-attention,这个操作是多个点乘的方式,和传统的attention机制不一样,传统的attention机制是对每一个词做权重的加和。

To obtain the probability that a particular token in the document answers the query we take an inner-product between outputs of the last layer qK and eKt , and pass through a soft-max layer:

最后一层做内积(dot-product between two vectors),得到概率分布,然后利用sum-reader思想,将相同的文档词语预测加和。这个概率分布是针对文档中的词来讲的,若文档中有相同词多次出现,则把每一个位置的词概率加和,得到最终这个词预测的概率。这样的方式,就是直接在文档中寻找答案,阅读理解有一个假设就是回答的答案应该出现在文档中至少一次,如此,在学习的最后阶段我们可以直接从文档中找寻答案。

           

实验分析

Three parameters were tuned on the validation set for each dataset—the number of layers K, GRU hidden state sizes (both query and document) d, and the dropout rate p. We experimented with K = 2; 3, d = 256; 384 and p = 0:1; 0:2; 0:3; 0:4; 0:5. Memory constraints prevented us from experimenting with higher K.

时间: 2024-10-26 03:11:28

Gated-Attention Readers for Text Comprehension的相关文章

问答系统总结

最近在研究问答系统,但是在查找资料的过程中一直处于懵逼状态,因为问答系统分类比较多,根据不同的依据可以分为不同种类,总是搞混,也没有找到资料详细全面的介绍,所以在一边学习查找资料的同时,自己也整理出一份总结,用于以后学习过程不至于思路混乱,如有错误请帮忙指出. 19世纪60年代最早:基于模板和规则 19世纪90年代:基于检索(IR)匹配-从问题中提取关键词,根据关键词在文本库中搜索相关文档,并进行降序排序,然后从文档中提取答案.        主要模型有:            单轮:DSSM,

Measuring Text Difficulty Using Parse-Tree Frequency

https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In previous work, we conducted a preliminary corpus study of grammar frequency which showed that difficult texts use a wider variety of high-level grammat

Attention and Augmented Recurrent Neural Networks

Attention and Augmented Recurrent Neural Networks CHRIS OLAHGoogle Brain SHAN CARTERGoogle Brain Sept. 8 2016 Citation: Olah & Carter, 2016 Recurrent neural networks are one of the staples of deep learning, allowing neural networks to work with seque

2018-文本分类文献阅读总结

文章1 Generative and Discriminative Text Classification with Recurrent Neural Networks 时间:2017 机构:Google DeepMind 生成模型:generative 判别模型:discrimination 作者支持生成模型比判别模型具有更加优异的性能,经过生成模型和判别模型的建模给出结论. 判别模型使用LSTM的变体(增加了peephole到每一个门上,则每一个门都会受到上一节点的cell的影响),当给定了

Problem A: Artificial Intelligence?

Description Physics teachers in high school often think that problems given as text are more demanding than pure computations. After all, the pupils have to read and understand the problem first! So they don't state a problem like ``U=10V, I=5A, P=?"

Haskell ghci中调用pandoc的API进行markdown转换

所用环境:Windows Server 2008 + ghc 7.6.3(Haskell Platform 2013.2.0.0自带的) + pandoc 1.12.4 操作步骤: 1. 安装Haskell Platform,下载地址:http://www.haskell.org/platform/. 2. 安装pandoc,安装命令:cabal install pandoc 3. 在命令行中运行ghci 4. 引用pandoc的相应模块,在Prelude命令提示符中运行: :module Te

POJ 2256 Artificial Intelligence?【字符串处理】

题目链接:http://poj.org/problem?id=2256 Artificial Intelligence? Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 1323   Accepted: 643 Description Physics teachers in high school often think that problems given as text are more demanding than

UVA 537-Artificial Intelligence?(模拟)

Artificial Intelligence? Time Limit:3000MS     Memory Limit:0KB     64bit IO Format:%lld & %llu Submit Status Description  Artificial Intelligence?  Physics teachers in high school often think that problems given as text are more demanding than pure

UVA之537 - Artificial Intelligence?

 Artificial Intelligence?  Physics teachers in high school often think that problems given as text are more demanding than pure computations. After all, the pupils have to read and understand the problem first! So they don't state a problem like ``U=