Multiple-output composite quantile regression through an optimal transport lens

Xuzhi Yang; Tengyao Wang

Multiple-output composite quantile regression through an optimal transport lens

Xuzhi Yang, Tengyao Wang

Abstract

Composite quantile regression has been used to obtain robust estimators of regression coefficients in linear models with good statistical efficiency. By revealing an intrinsic link between the composite quantile regression loss function and the Wasserstein distance from the residuals to the set of quantiles, we establish a generalization of the composite quantile regression to the multiple-output settings. Theoretical convergence rates of the proposed estimator are derived both under the setting where the additive error possesses only a finite $\ell$-th moment (for $\ell > 2$) and where it exhibits a sub-Weibull tail. In doing so, we develop novel techniques for analyzing the M-estimation problem that involves Wasserstein-distance in the loss. Numerical studies confirm the practical effectiveness of our proposed procedure.

Multiple-output composite quantile regression through an optimal transport lens

Abstract

-th moment (for

) and where it exhibits a sub-Weibull tail. In doing so, we develop novel techniques for analyzing the M-estimation problem that involves Wasserstein-distance in the loss. Numerical studies confirm the practical effectiveness of our proposed procedure.

Paper Structure (22 sections, 27 theorems, 179 equations, 3 figures)

This paper contains 22 sections, 27 theorems, 179 equations, 3 figures.

Introduction
Related works
Notation
The MCQR construction
Univariate CQR revisited
Multiple-output CQR via optimal transport
Solving MCQR via linear programming
Theoretical guarantees
Numerical experiments
Proof of main results
Preliminaries on optimal transport theory
Additional notation
Proof for Lemma \ref{['le: justification']}
Proof for Lemma \ref{['le: 1dmcqr']}
Proof for Proposition \ref{['prop: unique']}
...and 7 more sections

Key Result

Lemma 1

Under the linear model mlm, we have

Figures (3)

Figure 1: Illustration of proofs.
Figure 2: Logarithmic average loss, measured in matrix Mahalanobis norm, of the regression coefficient estimated by MCQR, CoorCQR, SpQR and LS for data generated according to the mechanism described in Section \ref{['sec: imp']} for various sample size $n$, covariate dimension $p$ and response dimension $d$ and four different noise distributions (panels (a) to (d)).
Figure 3: Logarithmic average estimation loss, measured in matrix Mahalanobis norm, of the regression coefficient estimated by MCQR, CoorCQR, SpQR and LS for data generated according to the mechanism described in Section \ref{['sec: imp']} for various outlier contamination proportion (from $0.05$ to $0.5$), covariate dimension $p$ and response dimension $d$ and two different noise contamination models. We fix $n=200$.

Theorems & Definitions (53)

Lemma 1
Lemma 2
Proposition 3
Definition 4
Theorem 5
Lemma 6
Lemma 7
Theorem 8
Theorem 9
Theorem 10: Kantorovich--Rubinstein theorem
...and 43 more

Multiple-output composite quantile regression through an optimal transport lens

Abstract

Multiple-output composite quantile regression through an optimal transport lens

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (53)