Markov Random Fields

We have seen that directed graphical models specify a factorization of the joint distribution over a set of variables into a product of local conditional distributions. They also define a set of conditional independence properties that must be satisfied by any distribution that factorizes according to the graph. We turn now to the second major class of graphical models that are described by undirected graphs and that again specify both a factorization and a set of conditional independence relations.

A Markov random field, also known as a Markov network or an undirected graphical model, has a set of nodes each of which corresponds to a variable or group of variables, as well as a set of links each of which connects a pair of nodes. The links are undirected, that is they do not carry arrows. In the case of undirected graphs, it is convenient to begin with a discussion of conditional independence properties.

8.3.1 Conditional independence properties

In the case of directed graphs, we saw that it was possible to test whether a particular conditional independence property holds by applying a graphical test called d-separation. This involved testing whether or not the paths connecting two sets of nodes were ‘blocked‘. The definition of blocked, however, was somewhat subtle due to the presence of paths having head-to-head nodes. We might ask whether it is possible to define an alternative graphical semantics for probability distributions such that conditional independence is determined by simple graph separation. This is indeed the case and corresponds to undirected graphical models. By removing the directionally from the links of the graph, the asymmetry between parent and child nodes is removed, and so the subtleties associated with head-to-head nodes no longer arise.

Suppose that in an undirected graph we identify three sets of nodes, denoted A, B, and C, and what we consider the conditional independence property.

(1)

To test whether this property is satisfied by a probability distribution defined by a graph we consider all possible paths that connect nodes in set A to nodes in set B. If all such paths pass through one or more nodes in set C, then all such paths are ‘blocked‘ and so the conditional independence property holds. However, if there is at least one such path that is not blocked, then the property does not necessarily hold, or more precisely there will exist at least some distributions corresponding to the graph that do not satisfy this conditional independence relation. Note that this is exactly the same as the d-separation criterion except that there is no ‘explaining away‘ phenomenon. Testing for conditional independence in undirected graphs is therefore simpler than in directed graphs.

An alternative way t view the conditional independence test is to imagine removing all nodes in set C from the graph together with any node in A to any node in B. If there are no such paths, then the conditional independence property must hold.

The Markov blanket for an undirected graph takes a particularly simple form, because a node will be conditionally independent of all other nodes conditioned only on the neighbouring nodes.

8.3.2 Factorization properties

We now seek a factorization rule for undirected graphs that will correspond to the above conditional independence test. Again, this will involve expressing th joint distribution as a product of functions defined over sets of variables that are local to the graph. We therefore need to decide what is the appropriate notion of locality in this case.

If we consider two nodes and that are connected by a link, then these variables must be conditionally independent given all other nodes in the graph. This follows from the fact that there is no direct path between the two nodes, and all other paths pass through nodes that are observe, and hence those paths are blocked. This conditional independence property can be expressed as

(1)

where denotes the set of all variables with and removed. The factorization of the joint distribution must therefore be such that and do not appear in the same factor in order for the conditional independence property to hold for all possible distributions belonging to the graph.

This leads us to consider a graphical concept called a clique, which is defined as a subset of the nodes in a graph such that there exists a link between all pairs of nodes in the subset. In other words, the set of nodes in a clique is fully connected. Furthermore, a maximal clique is a clique such that it is not possible to include any are illustrated by the undirected graph over four variables. This graph has five cliques of two nodes given by , and , as well as two maximal cliques given by and . The set is not a clique because of the missing link from to .

We can therefore define the factors in the decomposition of the joint distribution to be functions of the variables in the cliques. In fact, we can consider functions of the maximal cliques, without loss of generality, because other cliques must be subsets of maximal cliques. Thus, if is a maximal clique and we define an arbitrary function over this clique, then including another factor defined over a subset of these variables would be redundant.

Let us denote a clique by and the set of variables in that clique by . Then the joint distribution is written as a product of potential functions over the maximal cliques of the graph

(2)

Here the quantity , sometimes called the partition function, is a normalization constant and is given by

(3)

which ensures that the distribution given by (2) is correctly normalized. By considering only potential functions which satisfy we ensure that . In (3) we have assumed that comprises discrete variables, but the framework is equally applicable t continuous variables, or a combination of the two, in which the summation is replaced by the appropriate combination of summation and integration.

Note that we do not restrict the choice of potential functions to those that have a specific probabilistic interpretation as marginal or conditional distributions. This is in contrast to directed graphs in which each factor represents the conditional distribution of the corresponding variable, conditioned on the state of its parents. However, in special cases, for instance where the undirected graph is constructed by starting with a directed graph, the potential functions may indeed have such an interpretation, as we shall see shortly.

One consequence of the generality of the potential functions is that their product will in general not be correctly normalized. We therefore have to introduce an explicit normalization factor given by (3). Recall that for directed graphs, the joint distribution was automatically normalized as a consequence of the normalization of each of the conditional distributions in the factorization.

The presence of this normalization constant is one of the major limitations of undirected graphs. If we have a model with discrete nodes each having states, then the evaluation of the normalization term involves summing over states and so (in the worst case) is exponential in the size of the model. The partition function is needed for parameter learning because it will be a function of any parameters that govern the potential functions . However, for evaluation of local conditional distributions, the partition function is not needed because a conditional is the ratio of two marginal, and the partition function cancels between numerator and denominator when evaluating this ratio. Similarly, for evaluating local marginal probabilities we can work with the unnormalized joint distribution and then normalize the marginals explicitly at the end. Provided the marginals only involves a small number of variables, the evaluation of their normalization coefficient will be feasible.

So far, we have discussed the notion of conditional independence based on simple graph separation and we have proposed a factorization of the joint distribution that is intended to correspond to this conditional independence structure. However, we have not made any formal connection independence structure. However, we have not made any formal connection between conditional independence and factorization for undirected graphs. To do so we need to restrict attention to potential functions that are strictly positive (i.e., never zero or negative for any choice of ). Given this restriction, we can make a precise relationship between factorization and conditional independence.

To do this we again return to the concept of a graphical model as a filter. Consider he set of all possible distributions defined over a fixed set of variables corresponding to the nodes of a particular undirected graph. We can define to be the set of such distributions that can be expressed as a factorization of the form with respect to the maximal cliques of the graph. The Hammersley-Clifford theorem states that the sets and are identical.

Because we are restricted to potential functions which are strictly positive it is convenient to express them as exponentials, so that

(4)

where is called an energy function, and the exponential representation is called the Boltzmann distribution. The joint distribution is defined as the product of potentials, and so the total energy is obtained by adding the energies of each of the maximal cliques.

In contrast to the factors in the joint distribution for a directed graph, the potentials in an undirected graph do not have a specific probabilities interpretation. Although this gives greater flexibility in choosing the potential functions, because there is no normalization constraint, it does raise the question of how to motivate a choice of potential function for a particular application. This can be done by viewing the potential function as expressing which configurations of the local variables are preferred to others. Global configurations that have a relatively high probability are those that find a good balance in satisfying the (possibly conflicting) influences of the clique potentials.

8.3.4 Relation to directed graphs

We have introduced two graphical frameworks for representing probability distributions, corresponding to directed and undirected graphs, and it is instructive to discuss the relation between these. Consider first the problem of taking a model that is specified using a directed graph and trying to convert it to an undirected graph. In some cases this is straightforward. Here the joint distribution for the directed graph is given as a product of conditionals in the form

(1)

Now let us convert this to an undirected graph representation. In the undirected graph, the maximal cliques are simply the pairs of neighbouring nodes, and so from (2) we wish to write the joint distribution in the form

(2)

Let us consider how to generalize this construction, so that we can convert any distribution specified by a factorization over a directed graph into one specified by a factorization over an undirected graph. This can be achieved if the clique potentials of the undirected graph are given by the conditional distributions of the directed graph. In order for this to be valid, we must ensure that the set of variables that appears in each of the conditional distributions is a member of at least on clique of the undirected graph. For nodes on the directed graph having just one parent, this is achieved simply by replacing the directed link with an undirected link. However, for nodes in the directed graph having more than one parent, this is not sufficient. These are nodes that have ‘head-to-head‘ paths encountered in our discussion of conditional independence. The joint distribution for the directed graph takes the form

(3)

We see that the factor involves the four variables , and , and so these must all belong to a single clique if this conditional distribution is to be absorbed into a clique potential. To ensure this, we add extra links between all pairs of parents of the node . Anachronistically, this process of ‘marrying the parents‘ has become known as moralization, and the resulting undirected graph, after dropping the arrows, is called the moral graph. It is important to observe that the moral graph in this example is fully connected and so exhibits no conditional independence properties, in contrast to the original directed graph.

Thus in general to convert a directed graph into an undirected graph, we first add additional undirected links between all pairs of parents for each node in the graph and then drop the arrows on the original links to give the moral graph. Then we initialize all of the clique potentials of the moral graph to 1. We then take each conditional distribution factor in the original directed graph and multiply it into one of the clique potentials. There will always exist at least one maximal clique that contains all of the variables in the factor as a result of the moralization step. Note that in all cases the partition function is given by .

The process of converting a directed graph into an undirected graph plays an important role in exact inference techniques such as the junction tree algorithm. Converting from an undirected to a directed representation is much less common and in general presents problems due to the normalization constraints.

We saw that in going from a directed to an undirected representation we had to discard some conditional independence properties from the graph. Of course, we could always trivially convert any distribution over a directed graph into one over an undirected graph by simply using a fully connected undirected graph. This would, however, discard all conditional independence properties and so would be vacuous. The process of moralization adds the fewest extra links and so retains the maximum number of independence properties.

We have seen that the procedure for determining the conditional independence properties is different between directed and undirected graphs. It turns out that the two types of graph can express different conditional independence properties, and it is worth exploring this issue in more detail. To do so, we return to the view of a specific (directed or undirected) graph as a filter, so that the set of all possible distributions over the given variables could be reduced to a subset that respects the conditional independence implied by the graph. A graph is said to be a D map (for ‘dependency map‘) of a distribution if every conditional independence statement satisfied by the distribution is reflected in the graph. Thus a completely disconnected graph (no links) will be a trivial D amp for any distribution.

时间: 2024-11-17 22:39:54

Markov Random Fields的相关文章

Markov Random Feilds

由于图像分割中经常用到MRF,条件随机场,一直不明白,写个博客学习一下. 一.Probabilistic Graph Models 概率图模型 讲到MRF,首先要介绍一下概率图模型的概念,即采用图表达概率分布模型PGM,PGM主要有如下优点: 提供一个概率模型结构可视化简单方法,且能够用来设计和驱动新的模型: 更深入的理解模型的属性,包括条件独立的属性: 复杂的推理和学习的计算可以用PGM进行表达: 一个图有如下属性: 图包括node(也称vertices)和links(也称edges),每个n

条件随机场介绍(7)—— An Introduction to Conditional Random Fields

参考文献 [1] S.M.AjiandR.J.McEliece,"Thegeneralizeddistributivelaw,"IEEETrans- actions on Information Theory, vol. 46, no. 2, pp. 325–343, 2000. [2] Y.Altun,I.Tsochantaridis,andT.Hofmann,"HiddenMarkovsupportvector machines," in Internation

(转)Image Segmentation with Tensorflow using CNNs and Conditional Random Fields

Daniil's blog Machine Learning and Computer Vision artisan. About/ Blog/ Image Segmentation with Tensorflow using CNNs and Conditional Random Fields Tensorflow and TF-Slim | Dec 18, 2016 A post showing how to perform Image Segmentation with a recentl

An Introduction to Conditional Random Fields论文摘要(翻译:Trey;审校:Shooya)

摘要:通常,我们希望预测大量的,正如依赖于其他观察到的变元一样,互相间同样存在依赖的变元.结构化的预测模型,本质上是分类方法和图形化建模的结合,这些预测模型将图形化模型对多变元数据的有效建模能力,以及分类方法通过使用大的输入特征集来作出预测的能力结合起来.本指引讨论用于结构化预测的一个流行的概率模型--条件随机域.条件随机域广泛应用于自然语言处理.计算机视觉,以及生物信息学.我们讨论推理的方法以及条件随机域的参数估计,包括实现大规模条件随机域的实际问题.我们并没有假设读者具备任何图形化建模的预备

a introduction to conditional random fields

1.Structured prediction methods are essentially a combination of classification and graphical modeling. 2.They combine the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform predicti

条件随机场介绍(1)—— An Introduction to Conditional Random Fields

条件随机场介绍 原文:An Introduction to Conditional Random Fields 作者: Charles Sutton (School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK)Andrew McCallum (Department of Computer Science, University of Massachusetts, Amherst, MA, 01003, USA)

条件随机场介绍(4)—— An Introduction to Conditional Random Fields

4. 推断 高效的推断算法对条件随机场的训练和序列预测都非常重要.主要有两个推断问题:第一,模型训练之后,为新的输入\(\mathbf{x}\)确定最可能的标记\(\mathbf{y}^* = \arg \max_{\mathbf{y}} p(\mathbf{y}|\mathbf{x})\):第二,如第5部分所述,参数估计常要求计算标记子集上的边缘分布,例如节点的的边缘分布\(p(y_t|\mathbf{x})\)和边上的边缘分布\(p(y_t,y_{t-1}|\mathbf{x})\).这两个

条件随机场 Conditional Random Fields

简介 假设你有冠西哥一天生活中的照片(这些照片是按时间排好序的),然后你很无聊的想给每张照片打标签(Tag),比如这张是冠西哥在吃饭,那张是冠西哥在睡觉,那么你该怎么做呢? 一种方法是不管这些照片的序列性(照片本来是按照时间排序的),然后给每张图片弄一个分类器.例如,给了你冠西哥一个月的生活照作为训练样本(打了Tag的),你可能就会学习到:早上6点黑乎乎的照片可能就是冠西哥在睡觉:有很多亮色的照片可能就是冠西哥在跳舞:有很多车的照片可能就是冠西哥在飙车. 很明显,照片的序列性包含有很多信息,忽视

条件随机场(Conditional Random Fields)入门篇

1.1 引言 关系数据有两个特征: 1 待建模实体概率不独立.2 待建模的实体往往有很多特征可以帮助分类.例如,分类网页时,网页的文字信息可以提供很多信息,但网页间的超链接也可以帮助分类.[Tasker et al., 2012] 图模型是一种利用实体间概率依赖性的自然形式,通常,图模型用来表示联合分布P(x,y), y 表示的是期望的预测,x则表示可见的待待建模实体,类似于mechine learing的 输入x(i)与类标记y,但是对非独立变量的联合分布建模会很困难,因为根据乘法公式P(x,