Fast Wasserstein rates for estimating probability distributions of probabilistic graphical models
Daniel Bartl, Stephan Eckstein
TL;DR
This work analyzes nonparametric distribution estimation in Wasserstein distance under known probabilistic graphical models. By quantifying smoothness of the conditional kernels via Wasserstein-Lipschitz and TV-Lipschitz conditions, it shows that estimation rates depend on local graph structure through the local dimension $d_{\rm loc}$ rather than the ambient dimension $d$. It introduces tractable estimators that achieve (and, for TV-Lipschitz, are sharp) minimax rates $\lesssim n^{-1/d_{\rm loc}}$ (WLip) or $\lesssim n^{-2/(2+d_{\rm loc})} + n^{-1/d_{\max}}$ (TV-Lip), with additional log factors in boundary cases. The results highlight when graph-based biases help accelerate learning and establish fundamental limits when continuity is absent, informing both theory and practice for structured nonparametric estimation in graphical models.
Abstract
Using i.i.d. data to estimate a high-dimensional distribution in Wasserstein distance is a fundamental instance of the curse of dimensionality. We explore how structural knowledge about the data-generating process which gives rise to the distribution can be used to overcome this curse. More precisely, we work with the set of distributions of probabilistic graphical models for a known directed acyclic graph. It turns out that this knowledge is only helpful if it can be quantified, which we formalize via smoothness conditions on the transition kernels in the disintegration corresponding to the graph. In this case, we prove that the rate of estimation is governed by the local structure of the graph, more precisely by dimensions corresponding to single nodes together with their parent nodes. The precise rate depends on the exact notion of smoothness assumed for the kernels, where either weak (Wasserstein-Lipschitz) or strong (bidirectional Total-Variation-Lipschitz) conditions lead to different results. We prove sharpness under the strong condition and show that this condition is satisfied for example for distributions having a positive Lipschitz density.
