
In statistics, the distance earth movers ( EMD ) is a measure of the distance between two probability distributions over an area D . In mathematics, this is known as the Wasserstein metric. Informally, if the distribution is defined as two different ways to accumulate some dirt in this region D , EMD is the minimum cost to convert one stack to another; where the cost is assumed to be the amount of dirt moved over the distance that was moved.
The above definition applies only if two distributions have the same integral (informally, if two piles have the same amount of dirt), as in a normalized histogram or probability density function. In this case, EMD is equivalent to 1st Mallows distance or 1 Wasserstein distance between two distributions.
Video Earth mover's distance
Theory
Asumsikan bahwa kita memliki serangkaian poin dalam (dimension ). Alih-alih menetapkan satu distribusi that set poin, kita dapat mengelompokkannya dan mewakili titik yang ditetapkan dalam hal gugus. Denuncia demikian, setiap cluster adalah satu titik dalam dan bobot gugus ditentukan oleh fraksi dari distribusi yang ada dalam kluster tersebut. Representasi distribusi oleh sekelompok gugus disebut tanda tangan . Second turn tangan dapat memilize ukuran yang berbeda, misalnya, distribusi bimodal memiliki tanda tangan yang lebih pendek (2 klaster) daripada yang rumit. Satu perwakilan klaster (mean atau mode dalam ) dapat dianggap sebagai fitur tunggal dalam tanda tangan. Jarak antara masing-masing fitur disebut sebagai jarak darat .
EMD problems can be solved as transportation problems. Suppose that several suppliers, each with a certain amount of goods, are required to supply some consumers, each with limited capacity provided. For each supplier-consumer partner, the cost of transporting one unit of goods is given. Transportation problems then find the most inexpensive flow of goods from suppliers to consumers that meet consumer demand. Similarly, here the problem is changing one signature ( ) to another ( ) with the minimum work done.
Kami ingin menemukan aliran , denote aliran antara dan , yang meminimalkan keseluruhan biaya.
-
Aliran optimal ditemukan dengan menyelesaikan masalah optimasi linier ini. Jarak penggerak bumi didefinisikan sebagai pekerjaan yang dinormalkan oleh aliran total:
-
Maps Earth mover's distance
Ekstensi
Some applications may require comparison of distribution with different mass masses. One approach is to allow partial matches, where the dirt from the largest distribution is reset to make the smallest, and any remaining "debris" is removed at no cost. Under this approach, EMD is no longer the real distribution distance.
Another approach is to allow mass to be created or destroyed, at a global and/or local level, as an alternative to transportation, but with a cost penalty. In this case one has to determine the real parameter ?, the ratio between the cost of creating or destroying a unit of "dirt", and the cost of transportation by unit distance. This is equivalent to minimizing the amount of earth moving costs plus? times the L1 distance between the reconstituted pile and the second distribution.
Notationally, jika adalah fungsi parsial yang merupakan kumpulan pada subset dan , maka seseorang tertarik pada fungsi jarak
-
di mana menunjukkan set minus. Di sini, akan menjadi bagian dari bumi yang dipindahkan; jadi akan menjadi bagian yang tidak dipindahkan, dan ukuran tumpukan tidak dipindahkan. Dengan simetri, seseorang merenungkan sebagai tumpukan di tempat tujuan yang 'sampai di sana' dari P , dibandingkan dengan total Q bahwa kita ingin ada di sana . Secara formal, jarak ini menunjukkan seberapa banyak korespondensi suntik berbeda dari isomorfisma.
src: i.ytimg.com
Menghitung EMD
EMD can be calculated by solving transport instance problems, using any algorithm for minimum cost flow issues, e.g. simplex network algorithm.
Hungarian algorithms can be used to obtain solutions if the domain D is the set {0,1} . If the domain is an integral part, it can be translated to the same algorithm by representing the integral trash as some binary binary.
Sebagai kasus khusus, jika D adalah susunan satu dimensi "sampah", EMD dapat dihitung secara efisien dengan memindai larik dan mencatat berapa banyak kotoran yang perlu diangkut di antara tempat sampah berturut-turut:
-
src: slideplayer.com
Analisis kesamaan berbasis EMD
EMD equality analysis (EMDSA) is an important and effective tool in many multimedia information retrieval and recognition applications. However, the cost of computing EMD is super-cubic for the number of "vats" given "D" arbitrarily. An efficient and scalable EMD calculation technique for large-scale data has been studied using MapReduce, as well as a synchronous mass parallel dataset and a powerful distribution.
src: i.ytimg.com
Apps
The initial application of EMD in computer science is comparing two grayscale images that may be different due to the dry, opaque, or local deformation. In this case, the region is the image domain, and the total amount of light (or ink) is "dirt" to be rearranged.
EMD is widely used in content-based shooting to calculate the distance between color histograms of two digital images. In this case, the region is a RGB color cube, and each pixel of the image is a "dirt" field. The same technique can be used for other quantitative pixel attributes, such as luminance, gradients, real movements in video frames, etc.
More generally, EMD is used in pattern recognition to compare generic summaries or substitutes for data records called signatures. A typical signature consists of a list of pairs ( x 1 , m 1 ),... ( x n , m n )), where each x i is a certain "feature" (for example, colors in images, letters in text, etc.), and m i is the "mass" (how many times the feature occurred in the recording). Alternatively, x i may be the center of the data cluster, and m i number of entities in that cluster. To compare the two signatures with the EMD, one must determine the distance between features, which is interpreted as the cost of converting mass units from one feature into another. The EMD between the two signatures is then the minimum cost for turning one of them into another.
src: slideplayer.com
History
This concept was first introduced by Gaspard Monge in 1781, and anchor the field of transportation theory. The use of EMD as a measure of distance for monochromatic images was described in 1989 by S. Peleg, M. Werman and H. Rom. The name of "long distance motion" was proposed by J. Stolfi in 1994, and was used in print in 1998 by Y. Rubner, C. Tomasi and L. G. Guibas.
src: www4.comp.polyu.edu.hk
References
src: slideplayer.com
External links
- C code for Earth Movement
- Python2 wrapper for C implementation of Earth Mover's Distance
- C and Matlab and Java wrapper codes for Earth Mover's Distance, especially efficient for ground thresholded spacing
- Java implementation of generic generator to evaluate equality analysis based on Earth Mover based on
Source of the article : Wikipedia