Table of Contents
Fetching ...

Multiple-output composite quantile regression through an optimal transport lens

Xuzhi Yang, Tengyao Wang

Abstract

Composite quantile regression has been used to obtain robust estimators of regression coefficients in linear models with good statistical efficiency. By revealing an intrinsic link between the composite quantile regression loss function and the Wasserstein distance from the residuals to the set of quantiles, we establish a generalization of the composite quantile regression to the multiple-output settings. Theoretical convergence rates of the proposed estimator are derived both under the setting where the additive error possesses only a finite $\ell$-th moment (for $\ell > 2$) and where it exhibits a sub-Weibull tail. In doing so, we develop novel techniques for analyzing the M-estimation problem that involves Wasserstein-distance in the loss. Numerical studies confirm the practical effectiveness of our proposed procedure.

Multiple-output composite quantile regression through an optimal transport lens

Abstract

Composite quantile regression has been used to obtain robust estimators of regression coefficients in linear models with good statistical efficiency. By revealing an intrinsic link between the composite quantile regression loss function and the Wasserstein distance from the residuals to the set of quantiles, we establish a generalization of the composite quantile regression to the multiple-output settings. Theoretical convergence rates of the proposed estimator are derived both under the setting where the additive error possesses only a finite -th moment (for ) and where it exhibits a sub-Weibull tail. In doing so, we develop novel techniques for analyzing the M-estimation problem that involves Wasserstein-distance in the loss. Numerical studies confirm the practical effectiveness of our proposed procedure.
Paper Structure (22 sections, 27 theorems, 179 equations, 3 figures)

This paper contains 22 sections, 27 theorems, 179 equations, 3 figures.

Key Result

Lemma 1

Under the linear model mlm, we have

Figures (3)

  • Figure 1: Illustration of proofs.
  • Figure 2: Logarithmic average loss, measured in matrix Mahalanobis norm, of the regression coefficient estimated by MCQR, CoorCQR, SpQR and LS for data generated according to the mechanism described in Section \ref{['sec: imp']} for various sample size $n$, covariate dimension $p$ and response dimension $d$ and four different noise distributions (panels (a) to (d)).
  • Figure 3: Logarithmic average estimation loss, measured in matrix Mahalanobis norm, of the regression coefficient estimated by MCQR, CoorCQR, SpQR and LS for data generated according to the mechanism described in Section \ref{['sec: imp']} for various outlier contamination proportion (from $0.05$ to $0.5$), covariate dimension $p$ and response dimension $d$ and two different noise contamination models. We fix $n=200$.

Theorems & Definitions (53)

  • Lemma 1
  • Lemma 2
  • Proposition 3
  • Definition 4
  • Theorem 5
  • Lemma 6
  • Lemma 7
  • Theorem 8
  • Theorem 9
  • Theorem 10: Kantorovich--Rubinstein theorem
  • ...and 43 more