Table of Contents
Fetching ...

Fréchet regression of multivariate distributions with nonparanormal transport

Junyoung Park, Irina Gaynanova

TL;DR

A new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family, and theoretical justification for NPT is provided, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality.

Abstract

Regression with distribution-valued responses and Euclidean predictors has gained increasing scientific relevance. While methodology for univariate distributional data has advanced rapidly in recent years, multivariate distributions, which additionally encode dependence across univariate marginals, have received less attention and pose computational and statistical challenges. In this work, we address these challenges with a new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family. By incorporating the nonparanormal transport (NPT) metric -- an efficient closed-form surrogate for the Wasserstein distance -- into the Fréchet regression framework, our approach decomposes the problem into separate regressions of marginal distributions and their dependence structure, facilitating both efficient estimation and granular interpretation of predictor effects. We provide theoretical justification for NPT, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality. We further prove uniform convergence guarantees for regression estimators, both when distributional responses are fully observed and when they are estimated from empirical samples, attaining fast convergence rates comparable to the univariate case. The utility of our method is demonstrated via simulations and an application to continuous glucose monitoring data.

Fréchet regression of multivariate distributions with nonparanormal transport

TL;DR

A new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family, and theoretical justification for NPT is provided, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality.

Abstract

Regression with distribution-valued responses and Euclidean predictors has gained increasing scientific relevance. While methodology for univariate distributional data has advanced rapidly in recent years, multivariate distributions, which additionally encode dependence across univariate marginals, have received less attention and pose computational and statistical challenges. In this work, we address these challenges with a new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family. By incorporating the nonparanormal transport (NPT) metric -- an efficient closed-form surrogate for the Wasserstein distance -- into the Fréchet regression framework, our approach decomposes the problem into separate regressions of marginal distributions and their dependence structure, facilitating both efficient estimation and granular interpretation of predictor effects. We provide theoretical justification for NPT, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality. We further prove uniform convergence guarantees for regression estimators, both when distributional responses are fully observed and when they are estimated from empirical samples, attaining fast convergence rates comparable to the univariate case. The utility of our method is demonstrated via simulations and an application to continuous glucose monitoring data.
Paper Structure (60 sections, 30 theorems, 249 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 60 sections, 30 theorems, 249 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

The map $\Lambda:\mathcal{P}^d \times \mathcal{E}_d\to \mathcal{P}_2(\mathbb{R}^d)$ is injective on $\mathcal{P}_*^d\times \mathcal{E}_d$.

Figures (4)

  • Figure 1: Out-of-sample mean squared prediction errors (MSPEs) on $n^{\text{te}}=500$ test samples over $n_{\text{rep}}=100$ Monte Carlo replicates. Columns correspond to $d=2$ with linear correlation, $d=2$ with nonlinear correlation, and $d=10$. Within each column, boxplots report $\text{MSPE}_{\text{marg}}$ (top) and $\text{MSPE}_{\text{corr}}$ (bottom, log scale) across the $(n, N)$ settings shown on the $x$-axis.
  • Figure 2: Bivariate correlation regression in one Monte Carlo replicate with $(n, N) = (200, 1000)$. Gray points are the generated correlations plotted against the associated predictor component, and solid black curves show the noise-free generating functions $0.3Z^{(1)}$ (left) and $\tanh(2Z^{(2)})$ (right). The fitted curves from NPT-FR (dashed) and Gaussian-FR (dotted) are overlaid.
  • Figure 3: Nonparanormal Fréchet regression fit of trivariate CGM distributions on HbA1c. Top: Heatmap of the (Mean, MAD) component of fitted trivariate distributions at different HbA1c levels within the interval $[5.10, 10.95]$; Lcor in the title of each panel denotes the fitted latent correlation between Mean and MAD. Bottom: Scatterplot of observed latent correlations (points) and the nonparanormal regression fit (red line) as functions of HbA1c.
  • Figure S1: Out-of-sample mean squared prediction errors in $d_W$ ($\text{MSPE}_{\text{wass}}$) on $n^{\text{te}}=500$ test samples over $n_{\text{rep}}=100$ Monte Carlo replicates. Columns correspond to $d=2$ with linear correlation, $d=2$ with nonlinear correlation, and $d=10$.

Theorems & Definitions (52)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • ...and 42 more