以下内容为罗方炜译:
Earth mover’s distance
In computer science, the earth mover’s distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved [1].
在计算机科学与技术中,地球移动距离(EMD)是一种在D区域两个概率分布距离的度量,就是被熟知的Wasserstein度量标准。不正式的说,如果两个分布被看作在D区域上两种不同方式堆积一定数量的山堆,那么EMD就是把一堆变成另一堆所需要移动单位小块最小的距离之和。
The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms orprobability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions [2] [3].
上述的定义如果两个分布有着同样的整体(粗浅的说,就像两个堆有着同样的数量),在规范化的直方图或者概率密度函数上。在这基础上,EMD等同于两个分布的第一Mallows距离或者第一Wasserstein距离。
Extensions
Some applications may require the comparison of distributions with different total masses. One approach is to allow for a partial match, where dirt from the most massive distribution is rearranged to make the least massive, and any leftover “dirt” is discarded at no cost. Under this approach, the EMD is no longer a true distance between distributions. Another approach is to allow for mass to be created or destroyed, on a global and/or local level, as an alternative to transportation, but with a cost penalty. In that case one must specify a real parameter σ, the ratio between the cost of creating or destroying one unit of “dirt”, and the cost of transporting it by a unit distance. This is equivalent to minimizing the sum of the earth moving cost plus σ times the L1 distance between the rearranged pile and the second distribution.
一些应用需要比较不同总量的分布。一种方法是允许部分匹配,从最大分布上重新安排一些颗粒去产生最少的量,剩下多余的颗粒就被忽视不需要代价。这样的方法,EMD就不是真正两个分布间的距离。另外的方法允许块产生或销毁,在全局或局部范围,可以选择性的转变,但需要花费代价。那样的花,需要指定实数参数σ,这个σ表示产生或销毁一个单位一个距离颗粒所需要的花费。这就等同于最小化地球移动距离总和,花费σ倍重新堆和第二个分布的L1距离。
Computing the EMD
If the domain D is discrete, the EMD can be computed by solving an instance transportation problem, which can be solved by the so-called Hungarian algorithm. In particular, ifD is a one-dimensional array of “bins” the EMD can be efficiently computed by scanning the array and keeping track of how much dirt needs to be transported between consecutive bins.
如果D域是离散的,那么EMD可以用运输问题的Hungarian算法来计算他们的距离。特别的,如果D是一维的数组格子,你们EMD可以有效的通过扫描数组并记录有多少颗粒需要传送于两个连续格子来计算。
External links
§ C code for the Earth Mover’s Distance
§ C++ and Matlab and Java wrappers code for the Earth Mover’s Distance, especially efficient for thresholded ground distances
References
- ^ Formal definition
- ^ Elizaveta Levina; Peter Bickel (2001). “The EarthMover’s Distance is the Mallows Distance: Some Insights from Statistics”. Proceedings of ICCV 2001 (Vancouver, Canada): 251–256.
- ^ C. L. Mallows (1972). “A note on asymptotic joint normality”. Annals of Mathematical Statistics 43 (2): 508–515. doi:10.1214/aoms/1177692631.
- ^ a b S. Peleg; M. Werman, and H. Rom (1989). “A unified approach to the change of resolution: Space and gray-level”. IEEE Transactions on Pattern Analysis and Machine Intelligence 11: 739–742.doi:10.1109/34.192468.
- ^ “Mémoire sur la théorie des déblais et des remblais”. Histoire de l’Académie Royale des Science, Année 1781, avec les Mémoires de Mathématique et de Physique. 1781.
- ^ J. Stolfi, personal communication to L. J. Guibas, 1994
- ^ Yossi Rubner; Carlo Tomasi, Leonidas J. Guibas (1998). “A Metric for Distributions with Applications to Image Databases”. Proceedings ICCV 1998: 59–66.
转载自:http://en.wikipedia.org/wiki/Earth_mover‘s_distance
http://en.wikipedia.org/wiki/Transportation_problem
[转][译]一种度量准则:推土机距离Earth Mover's Distance(EMD)