Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery
Caixing Wang, Ziliang Shen
TL;DR
This work tackles distributed high-dimensional linear quantile regression by converting the non-smooth QR problem into a smooth least-squares problem through a double-smoothing, Newton-type transformation. The resulting DHSQR framework performs distributed estimation with minimal communication (broadcasting low-dimensional gradients) and uses a Lasso-penalized LS on a central node, achieving near-oracle rates with a constant number of iterations. Theoretical guarantees establish convergence rates of $\mathcal{O}_\mathbb{P}(\sqrt{s\log N / N})$ and beta-min conditions ensuring exact support recovery under standard regularity assumptions. Extensive simulations and a real-data HIV drug-sensitivity application demonstrate strong estimation accuracy, robust performance under heavy-tailed and heterogeneous noise, and favorable computation/communication efficiency compared with existing methods. The approach offers scalable, robust distributed quantile regression with provable guarantees for parameter estimation and variable selection in high dimensions.
Abstract
In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tackle these problems, we transform the original quantile regression into the least-squares optimization. By applying a double-smoothing approach, we extend a previous Newton-type distributed approach without the restrictive independent assumption between the error term and covariates. An efficient algorithm is developed, which enjoys high computation and communication efficiency. Theoretically, the proposed distributed estimator achieves a near-oracle convergence rate and high support recovery accuracy after a constant number of iterations. Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method.
