http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html
these characteristics may come from a word. (hand writting data)
sequence of observation => model the joint distribution over the whole sequence
linear chain CRF
usually => iid assumption
but for the adjacent positions in a sequence => linear chain CRF
first term: from x_k
seconde term: from V matrix
context window
three neural network, weighted by a(0) a(-1) a(+1)
alternative: only one NN
computing the partition function
y‘ ≠ y
y_k is the resultant sequence
y‘_k is all the probable sequence
the goal here is to calculate Z(X) in polynomial time (dynamic programming)
if someone gives me y2‘ then we can calculate \alpha_1(y2‘)
https://www.spaces.ac.cn/archives/5542/comment-page-1
advantage function????
a = max x_n
V(s) = max_a Q(a|s)
A(a|s) = Q(a|s) - V(s)
computing marginals
performing classification
factors, sufficient statistics and linear CRF
Markov network
factor graph
another visualization to get rid of the ambiguity.
belief propagation
原文地址:https://www.cnblogs.com/ecoflex/p/10884617.html