Moment Expansions of the Energy Distance
Ian Langmore
TL;DR
The paper analyzes the squared energy distance ${\mathcal{D}^2}(X,Y)$ in the regime where the distributions are close, showing that the mean difference $\mu$ typically dominates the loss, with covariance differences entering at higher order via an averaged, dimension-dependent term. By expressing ${\mathcal{D}^2}$ through a Fourier-cumulant expansion and introducing a decay scale $\lambda$, the authors derive a leading-moments expansion: a main $O(1/\lambda)$ term proportional to $\|\mu\|^2$ and a secondary $O(1/\lambda^3)$ term involving $\Delta$, $\mu$, and skew cumulants, plus a controlled remainder. They specialize to multivariate Gaussians to obtain explicit forms and demonstrate that off-diagonal covariance contributions are suppressed by a factor of order $1/d$ under spherical symmetry, while the diagonal part contributes at order $O(d^{-1/2})$ to the mean term. The work also contrasts the energy-distance-based gradient with a standard covariance loss via a cosine similarity analysis, showing how dimension and correlation structure influence learning dynamics. Numerical verification across Gaussian and non-Gaussian distributions confirms the leading-moments predictions and highlights the regimes where the theory holds, offering practical guidance for using energy-distance-based losses in high-dimensional learning tasks.
Abstract
The energy distance is used to test distributional equality, and as a loss function in machine learning. While $D^2(X, Y)=0$ only when $X\sim Y$, the sensitivity to different moments is of practical importance. This work considers $D^2(X, Y)$ in the case where the distributions are close. In this regime, $D^2(X, Y)$ is more sensitive to differences in the means $\bar{X}-\bar{Y}$, than differences in the covariances $Δ$. This is due to the structure of the energy distance and is independent of dimension. The sensitivity to on versus off diagonal components of $Δ$ is examined when $X$ and $Y$ are close to isotropic. Here a dimension dependent averaging occurs and, in many cases, off diagonal correlations contribute significantly less. Numerical results verify these relationships hold even when distributional assumptions are not strictly met.
