On the Information Processing of One-Dimensional Wasserstein Distances with Finite Samples
Cheongjae Jang, Jonghyun Won, Soyeon Jun, Chun Kee Chung, Keehyoung Joo, Yung-Kyun Noh
TL;DR
The paper analyzes how the one-dimensional Wasserstein distance $W_1$ between finite samples encodes pointwise density differences (rates) and support changes. Using Poisson processes, it derives analytic expressions for expected spike distances that reveal rate-difference encoding and the integration of rate and shift information, with asymptotic behavior clarified as sample size grows. The authors validate these insights through synthetic data and real-world neural spike-train and amino-acid contact datasets, showing that Wasserstein-based features improve classification and representation tasks and offer complementary perspectives to KL-based measures. Overall, the work provides a rigorous finite-sample interpretation of $W_1$ as a mixture of rate and support information, with practical implications for neuroscience and molecular biology and potential extensions to sliced Wasserstein distances.
Abstract
Leveraging the Wasserstein distance -- a summation of sample-wise transport distances in data space -- is advantageous in many applications for measuring support differences between two underlying density functions. However, when supports significantly overlap while densities exhibit substantial pointwise differences, it remains unclear whether and how this transport information can accurately identify these differences, particularly their analytic characterization in finite-sample settings. We address this issue by conducting an analysis of the information processing capabilities of the one-dimensional Wasserstein distance with finite samples. By utilizing the Poisson process and isolating the rate factor, we demonstrate the capability of capturing the pointwise density difference with Wasserstein distances and how this information harmonizes with support differences. The analyzed properties are confirmed using neural spike train decoding and amino acid contact frequency data. The results reveal that the one-dimensional Wasserstein distance highlights meaningful density differences related to both rate and support.
