Table of Contents
Fetching ...

On the Information Processing of One-Dimensional Wasserstein Distances with Finite Samples

Cheongjae Jang, Jonghyun Won, Soyeon Jun, Chun Kee Chung, Keehyoung Joo, Yung-Kyun Noh

TL;DR

The paper analyzes how the one-dimensional Wasserstein distance $W_1$ between finite samples encodes pointwise density differences (rates) and support changes. Using Poisson processes, it derives analytic expressions for expected spike distances that reveal rate-difference encoding and the integration of rate and shift information, with asymptotic behavior clarified as sample size grows. The authors validate these insights through synthetic data and real-world neural spike-train and amino-acid contact datasets, showing that Wasserstein-based features improve classification and representation tasks and offer complementary perspectives to KL-based measures. Overall, the work provides a rigorous finite-sample interpretation of $W_1$ as a mixture of rate and support information, with practical implications for neuroscience and molecular biology and potential extensions to sliced Wasserstein distances.

Abstract

Leveraging the Wasserstein distance -- a summation of sample-wise transport distances in data space -- is advantageous in many applications for measuring support differences between two underlying density functions. However, when supports significantly overlap while densities exhibit substantial pointwise differences, it remains unclear whether and how this transport information can accurately identify these differences, particularly their analytic characterization in finite-sample settings. We address this issue by conducting an analysis of the information processing capabilities of the one-dimensional Wasserstein distance with finite samples. By utilizing the Poisson process and isolating the rate factor, we demonstrate the capability of capturing the pointwise density difference with Wasserstein distances and how this information harmonizes with support differences. The analyzed properties are confirmed using neural spike train decoding and amino acid contact frequency data. The results reveal that the one-dimensional Wasserstein distance highlights meaningful density differences related to both rate and support.

On the Information Processing of One-Dimensional Wasserstein Distances with Finite Samples

TL;DR

The paper analyzes how the one-dimensional Wasserstein distance between finite samples encodes pointwise density differences (rates) and support changes. Using Poisson processes, it derives analytic expressions for expected spike distances that reveal rate-difference encoding and the integration of rate and shift information, with asymptotic behavior clarified as sample size grows. The authors validate these insights through synthetic data and real-world neural spike-train and amino-acid contact datasets, showing that Wasserstein-based features improve classification and representation tasks and offer complementary perspectives to KL-based measures. Overall, the work provides a rigorous finite-sample interpretation of as a mixture of rate and support information, with practical implications for neuroscience and molecular biology and potential extensions to sliced Wasserstein distances.

Abstract

Leveraging the Wasserstein distance -- a summation of sample-wise transport distances in data space -- is advantageous in many applications for measuring support differences between two underlying density functions. However, when supports significantly overlap while densities exhibit substantial pointwise differences, it remains unclear whether and how this transport information can accurately identify these differences, particularly their analytic characterization in finite-sample settings. We address this issue by conducting an analysis of the information processing capabilities of the one-dimensional Wasserstein distance with finite samples. By utilizing the Poisson process and isolating the rate factor, we demonstrate the capability of capturing the pointwise density difference with Wasserstein distances and how this information harmonizes with support differences. The analyzed properties are confirmed using neural spike train decoding and amino acid contact frequency data. The results reveal that the one-dimensional Wasserstein distance highlights meaningful density differences related to both rate and support.

Paper Structure

This paper contains 36 sections, 2 theorems, 41 equations, 12 figures, 3 tables.

Key Result

Proposition 3.1

For the $k$-th spikes $x_k$ and $y_k$ obtained from two Poisson processes of rates $\lambda_1$ and $\lambda_2$, respectively, the expection of the distance between $x_k$ and $y_k$ is where $p = \lambda_1/(\lambda_1+\lambda_2)$ and $P(i|2k, p) = \binom{2k}{i} p^{i}(1-p)^{2k-i}$ is the binomial distribution with the parameters $2k$ and $p$. The minimum of $\mathbb{E}[|x_k - y_k|]$ is achieved when

Figures (12)

  • Figure 1: Sample transport between empirical distributions derived from the underlying one-dimensional distributions $\mu$ and $\nu$. Blue and red spikes represent samples drawn from $\mu$ and $\nu$, respectively, with sample transport distances illustrated by dotted arrows. In (a), the prominent difference is in the support of $\mu$ and $\nu$, while in (b), the two densities are significantly different despite having the same support. In (b), it is desirable for the measure derived from the sample transport between empirical distributions to represent the pointwise density difference in the underlying distributions.
  • Figure 2: $\mathbb{E}[W(\hat{\mu}_N, \hat{\nu}_N)]$ for $\lambda_1, \lambda_2 \in [1, 5]$ and $N=20$. When the harmonic mean of the rates is constant (black dashed lines), $\mathbb{E}[W(\hat{\mu}_N, \hat{\nu}_N)]$ shows its minimum where the rates are equal ($\lambda_1 = \lambda_2$, red solid line).
  • Figure 3: A one-dimensional example to compare information processing of the Hausdorff distance, Jensen-Shannon divergence, and Wasserstein distance. In (a), blue and red spikes represent the samples generated by Poisson processes with time-varying rates $\mu(t)$ and $\nu(t)$, respectively. In (b), the values are averaged over 1,000 trials.
  • Figure 4: Three Isomap embeddings of human neural spike trains. We present the embedding obtained using the Wasserstein distance in (a), that from the spike count difference in (b), and that from the Victor-Purpura (VP) distance in (c).
  • Figure 5: Spike count histograms for windows starting at 210.0 seconds, 213.5 seconds, 231.0 seconds, and 234.5 seconds. Shaded areas in each histogram indicate the beginning and end regions of each window.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • Proposition 3.2