[Neural Network] {Université de Sherbrooke} C3: Conditional Random Field


these characteristics may come from a word. (hand writting data)

sequence of observation => model the joint distribution over the whole sequence

linear chain CRF

usually => iid assumption

but for the adjacent positions in a sequence => linear chain CRF

first term: from x_k

seconde term: from V matrix

context window

three neural network, weighted by a(0) a(-1) a(+1)

alternative: only one NN

computing the partition function

y‘ ≠ y

y_k is the resultant sequence

y‘_k is all the probable sequence

the goal here is to calculate Z(X) in polynomial time (dynamic programming)

if someone gives me y2‘ then we can calculate \alpha_1(y2‘)


advantage function????

a = max x_n

V(s) = max_a Q(a|s)

A(a|s) = Q(a|s) - V(s)

computing marginals

performing classification

factors, sufficient statistics and linear CRF

Markov network

factor graph

another visualization to get rid of the ambiguity.

belief propagation


时间: 2024-10-18 18:34:30

