Wasserstein multivariate auto-regressive models for modeling distributional time series
Yiye Jiang, Jérémie Bigot
TL;DR
The paper addresses modeling time-indexed distributions observed across multiple series by embedding distributional data in the Wasserstein space and proposing a Wasserstein multivariate autoregressive (WMAR) model. It develops an IRF-based theoretical foundation to guarantee existence, uniqueness, and second-order stationarity, and introduces a sparse, constrained estimator that enables learning a temporal dependency graph among series. The authors provide a practical centering strategy, a quantile-function representation, and a consistent estimation procedure, validated through simulations and real-data applications to age distributions and Paris bike-sharing. This framework enables interpretable cross-series dependency analysis and scalable distributional time-series modeling with rigorous statistical guarantees.
Abstract
This paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures that are indexed by distinct time instants and supported over a bounded interval of the real line. By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. Using the theory of iterated random function systems, results on the second order stationarity of the solution of such a model are provided. We also propose a consistent estimator for the auto-regressive coefficients of this model. Due to the simplex constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows the application of the proposed model in learning a graph of temporal dependency from multivariate distributional time series. We explore the numerical performances of our estimation procedure using simulated data. To shed some light on the benefits of our approach for real data analysis, we also apply this methodology to two data sets, respectively made of observations from age distribution in different countries and those from the bike sharing network in Paris.
