Language Model

##20170801
##notes for lec2-2.pdf about language model

Evaluating a Language Model

Intuition about Perplexity

Evaluating N‐grams with Perplexity

Sparsity is Always a Problem
   Dealing with Sparsity
       General approach: modify observed counts to improve estimates
           – Discounting: allocate probability mass for unobserved
           events by discounting counts for observed events

– Interpolation: approximate counts of N‐gram using
combination of estimates from related denser histories

– Back‐off: approximate counts of unobserved N‐gram based
on the proportion of back‐off events (e.g., N‐1 gram)

Add‐One Smoothing
               ? We have V words in the vocabulary, N is the number of words
               in the training set
               ? Smooth observed counts by adding one to all the counts and
               renormalize
               – Unigram case:
               – Bigram case:
               ? More general case: add‐α, when α is added instead of one.

Linear Interpolation

Tuning Hyperparameters
               ? Both add‐α and linear interpolation have hyperparameters.
               ? The selection of their values is crucial for the smoothing
               The selection of their values is crucial for the smoothing
               performance
               ? Their values are tuned to maximize the likelihood of held‐out
               data
               – For linear interpolation, we will use EM to find optimal
               parameters (in few lectures)

Kneser‐Ney Smoothing
               ? Observed n‐grams occur more in the training data than in the
               new data
               ? Absolute discounting: count*(x)=count(x)‐d
               P ad ( w i | w i ? 1 ) =
               count ( w i , w i ? 1 ) ? d
               + α P ? ( w i )
               count ( w i ? 1 )
               ? Distribute the remaining mass based on the skewness in the
               distribution of the lower order N‐gram (i.e., the number of
               words it can follow)
               P ? ( w i ) ∝ | w i ? 1 : count ( w i , w i ? 1 ) > 0 |
               ? Kneser‐Ney is repeatedly proven as very successful estimator

时间： 2024-10-18 00:24:55

Language Model

Language Model的相关文章

用CNTK搞深度学习（二）训练基于RNN的自然语言模型 ( language model )

A Neural Probabilistic Language Model

NLP：language model(n-gram/Word2Vec/Glove)

Feedforward Neural Network Language Model(NNLM)c++核心代码实现

将迁移学习用于文本分类《 Universal Language Model Fine-tuning for Text Classification》

Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition（Facebook AI 2019）文献综述

language model —— basic model 语言模型之基础模型

cs224d problem set2 (三) 用RNNLM模型实现Language Model，来预测下一个单词的出现

pytorch ---神经网络开篇之作 NNLM <A Neural Probabilistic Language Model>