(paper reading)Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions

给定一个包含一系列实体E的知识库,以及提到了M个已确定实体的文本集合,实体链接的目的是将文本中提到的每个实体m∈M链接到知识库中对应的实体e∈E上。如果文本中提到的实体在知识库中没有对应,则被称为unlinkable mentions,对这样的一类实体,一个实体链接系统会给它加上一个特殊的标签NIL。

一个典型的实体链接系统应该包含三个模块:

  • Candidate entity generation

  对M当中的每一个m,实体链接系统需要在知识库中找出候选的实体集合Em,主要的实现方法有:

    • dictionary based techniques

      利用wikipedia的一些属性构造一个字典,然后在字典当中进行查找。

    • surface form expanssion from the local document

      使用一些方法将要链接的实体m展开成全名,别名等。

      • Heuristic Based Methods
      • Supervised Learning Methods
    • methods based on search engine 

      一些搜索引擎集成了寻找相似名称的实体的功能,所以存在直接利用搜索引擎的方法。

  • Candidate entity ranking

  将候选的实体集合按照一定的准则进行排序,挑选出最有可能满足条件的实体。

  确定准则需要了解实体的features,context-independant features包括name string comparison,entity popularity和entity type,即只需要考虑实体本身和候选的实体集合本身,context-dependant features则需要分析实体出现的环境,包括textual context和coherence between mapping entities。

  对候选实体的集合进行排序主要的实现方法有:

    • supervised ranking methods

      • binary classification methods
      • learning to rank methods
      • probabilistic methods
      • graph based approaches
      • model combination
      • training data generation
    • unsupervised ranking methods
      • VSM based methods
      • information retieval based methods
  • Unlinkable mention prediction

  确认排序最靠前的候选实体是否是m对应的目标实体,如果都不是需要给m加上unlikable mention的标签。

实体链接的应用主要有:

  • Information Extraction
  • Information Retrieval
  • Content Analysis
  • Question Answering
  • Knowledge Base Population

作者认为未来的研究方向有:

1. 考虑对其他类型的mention进行链接,而不是文本中的。

2. 考虑计算复杂度,效率和可扩展性。

3. 考虑domain-specific entity linking system。

时间: 2024-08-25 02:12:54

(paper reading)Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions的相关文章

Paper Reading: Stereo DSO

开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras Abstract Optimization objectives: intrinsic/extrinsic parameters of all keyframes all selected pixels' depth Inte

Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11  19:47:46   CVPR 2017 This paper use GAN to handle the issue of small object detection which is a very hard problem in general object detection. As shown in the followin

【Paper Reading】Object Recognition from Scale-Invariant Features

Paper: Object Recognition from Scale-Invariant Features Sorce: http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf SIFT 即Scale Invariant Feature Transfrom, 尺度不变变换,由David Lowe提出.是CV最著名也最常用的特征.在图像目标识别的应用中,常常要求图像的特征有很好的roboust即不容易受到平移,旋转,尺度缩放,光照,仿射的英雄.SIFT算子具有

Paper Reading: Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual TrackingECCV 2016  The key point of KCF is the ability to efficiently exploit available negative data by including all shifted versions of a training sample, in anthor w

CVPR 2016 paper reading (6)

1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun, in CVPR 2015. Goal: learn and predict how fashionable a person looks on a photograph, and suggest subtle

paper reading in 1/1/2016~1/3/2016

CVPR15:Person Count Localization in Videos from Noisy Foreground and Detections paper主要的contribution是定义了person count localization及其周边,不过虽然之前提过的person count问题常用结果评估标准是只看最后给出的counts,但其实之前的文章也并不完全是只给出global counts的.文中可能是更加重视这个localization的问题并且确实是利用这个信息去

entity linking

Entity Resolution for Big Data[Getoor & Machanavajjhala, KDD 2013]

Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes from: http://blog.csdn.net/shuzfan/article/details/70069822 Paper:  https://arxiv.org/abs/1703.07737 Github: https://github.com/VisualComputingInstitu

【Paper Reading】Learning while Reading

Learning while Reading 不限于具体的书,只限于知识的宽度 这个系列集合了一周所学所看的精华,它们往往来自不只一本书 我们之所以将自然界分类,组织成各种概念,并按其分类,主要是因为我们是整个口语交流社会共同遵守的协定的参与者,这个协定以语言的形式固定下来.除非赞成这个协定中规定的有关语言信息的组织和分类,否则我们根本无法交谈. --Benjamin Lee Whorf Learning and Asking 为什么选择面向对象? 机器语言.汇编语言.面向过程的语言,通过一层层