[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

论文链接:https://arxiv.org/pdf/1502.03044.pdf

代码链接:https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow



主要贡献

在这篇文章中,作者将“注意力机制(Attention Mechanism)”引入了神经机器翻译(Neural Image Captioning)领域,提出了两种不同的注意力模型:‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下图展示了Show, Attend and Tell模型的整体框架。

  • Stochastic ‘Hard‘ Attention
  • Deterministic ‘Soft‘ Attention


实验细节

原文地址:https://www.cnblogs.com/zlian2016/p/10987263.html

时间: 2024-08-30 15:09:46

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的相关文章

Paper Reading: Stereo DSO

开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras Abstract Optimization objectives: intrinsic/extrinsic parameters of all keyframes all selected pixels' depth Inte

paper 27 :图像/视觉显著性检测技术发展情况梳理(Saliency Detection、Visual Attention)

1. 早期C. Koch与S. Ullman的研究工作. 他们提出了非常有影响力的生物启发模型. C. Koch and S. Ullman . Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4):219-227, 1985. C. Koch and T. Poggio. Predicting the Visual World: Silenc

CVPR 2016 paper reading (6)

1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun, in CVPR 2015. Goal: learn and predict how fashionable a person looks on a photograph, and suggest subtle

【Paper Reading】R-CNN(V5)论文解读

R-CNN论文:Rich feature hierarchies for accurate object detection and semantic segmentation 用于精确目标检测和语义分割的丰富特征层次结构作者:Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik,UC Berkeley(加州大学伯克利分校)一作者Ross Girshick的个人首页:http://www.rossgirshick.info/,有其

Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11  19:47:46   CVPR 2017 This paper use GAN to handle the issue of small object detection which is a very hard problem in general object detection. As shown in the followin

【Paper Reading】Object Recognition from Scale-Invariant Features

Paper: Object Recognition from Scale-Invariant Features Sorce: http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf SIFT 即Scale Invariant Feature Transfrom, 尺度不变变换,由David Lowe提出.是CV最著名也最常用的特征.在图像目标识别的应用中,常常要求图像的特征有很好的roboust即不容易受到平移,旋转,尺度缩放,光照,仿射的英雄.SIFT算子具有

Paper Reading: Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual TrackingECCV 2016  The key point of KCF is the ability to efficiently exploit available negative data by including all shifted versions of a training sample, in anthor w

paper reading in 1/1/2016~1/3/2016

CVPR15:Person Count Localization in Videos from Noisy Foreground and Detections paper主要的contribution是定义了person count localization及其周边,不过虽然之前提过的person count问题常用结果评估标准是只看最后给出的counts,但其实之前的文章也并不完全是只给出global counts的.文中可能是更加重视这个localization的问题并且确实是利用这个信息去

Paper Reading 1 - Playing Atari with Deep Reinforcement Learning

来源:NIPS 2013 作者:DeepMind 理解基础: 增强学习基本知识 深度学习 特别是卷积神经网络的基本知识 创新点:第一个将深度学习模型与增强学习结合在一起从而成功地直接从高维的输入学习控制策略 详细是将卷积神经网络和Q Learning结合在一起.卷积神经网络的输入是原始图像数据(作为状态)输出则为每一个动作相应的价值Value Function来预计未来的反馈Reward 实验成果:使用同一个网络学习玩Atari 2600 游戏.在測试的7个游戏中6个超过了以往的方法而且好几个超