Table of Contents
Fetching ...

Vector Quantile Regression on Manifolds

Marco Pegoraro, Sanketh Vedula, Aviv A. Rosenberg, Irene Tallini, Emanuele Rodolà, Alex M. Bronstein

TL;DR

This work extends quantile regression to data on non-Euclidean manifolds by formulating conditional vector quantile functions on manifolds (M-CVQFs) via a dual, OT-based approach. It defines the manifold vector quantile function (M-VQF) as a map $Q_{\boldsymbol{Y}}(\boldsymbol{u})=\exp_{\boldsymbol{u}}[-\nabla_{\boldsymbol{u}}\varphi(\boldsymbol{u})]$ and extends it to conditional settings (M-VQR) by learning $c$-concave potentials parameterized with partially input $c$-concave networks. The method enables conditional sampling, likelihood estimation, and confidence-set construction on spheres and tori, demonstrated with synthetic and real datasets, and achieved scalable performance relative to prior spherical approaches. The approach broadens the applicability of QR to domains where data live on curved spaces, offering a principled and scalable way to capture complex conditional distributions on manifolds with potential impact in fields such as climate science, biology, and structural biology.

Abstract

Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments.

Vector Quantile Regression on Manifolds

TL;DR

This work extends quantile regression to data on non-Euclidean manifolds by formulating conditional vector quantile functions on manifolds (M-CVQFs) via a dual, OT-based approach. It defines the manifold vector quantile function (M-VQF) as a map and extends it to conditional settings (M-VQR) by learning -concave potentials parameterized with partially input -concave networks. The method enables conditional sampling, likelihood estimation, and confidence-set construction on spheres and tori, demonstrated with synthetic and real datasets, and achieved scalable performance relative to prior spherical approaches. The approach broadens the applicability of QR to domains where data live on curved spaces, offering a principled and scalable way to capture complex conditional distributions on manifolds with potential impact in fields such as climate science, biology, and structural biology.

Abstract

Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments.
Paper Structure (54 sections, 42 equations, 15 figures, 2 tables)

This paper contains 54 sections, 42 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Sampling and confidence sets for the 'Scaled Heart' and 'Scaled Star' distributions $\boldsymbol{\mathrm{Y}}|\boldsymbol{\mathrm{X}}$ on $\mathcal{S}^{2}$ and $\mathcal{T}_2$, under different conditioning values. The conditioning variable $x$ controls the scale of the distribution. $\tau$-contours shown as colored lines. The probability of $\boldsymbol{\mathrm{Y}}|\boldsymbol{\mathrm{X}}$ falling inside a $\tau$-contour is $\tau$.
  • Figure 2: Impact of involution regularization. The results are from a M-VQE trained on $\mathcal{S}^2$, where the target distribution is a von-Mises distribution and the c-concave potential consists of 3 layers and $\gamma=0.1$. Involution error is dramatically reduced when training with the involution regularization.
  • Figure 3: M-VQE approximation of the quantile function $Q_{\boldsymbol{\mathrm{Y}}}$ of the 'Multimodal von-Mises' distribution produces nested, smooth, and valid contours and correctly estimates the likelihood function. Subfigure (a) shows ground truth and estimated likelihood functions. We use the Mollweide projection to plot the whole sphere surface. Subfigure (b) shows $\tau$-contours overlayed on the ground truth samples, for different values of $\tau$. Graph (c) plots the requested coverage level on the horizontal axis and the coverage achieved by the model on the vertical axis.
  • Figure 4: Likelihood function $p_{\boldsymbol{\mathrm{Y}}|\boldsymbol{\mathrm{X}}}$ for the 'Conditional Multimodal' distribution. The covariate $\mathrm {X}$ controls the scale of the distribution. $ESS_{\%}$ values are also reported. We use the mollweide projection to plot the whole sphere surface.
  • Figure 5: $\tau$-confidence sets constructed with M-VQR on the 'Continental Drift' dataset are smooth, nested, and valid. Subfigures (a) and (b) report $\tau$-contours overlayed on the ground truth samples, for different values of $\tau$; each subfigure represents conditioning on a different era. Mollweide projection is used to visualize the whole sphere. Graph (c) shows the coverage achieved by the model as a function of the requested coverage level, averaged over the different conditionings with relative confidence bars.
  • ...and 10 more figures