A generalized Bayesian approach for high-dimensional robust regression with serially correlated errors and predictors
Saptarshi Chakraborty, Kshitij Khare, George Michailidis
TL;DR
This work tackles high-dimensional robust regression under serially correlated errors by developing a generalized Bayesian framework built on a scaled pseudo-Huber (SPH) loss that adaptively balances $\ell_2$ and $\ell_1$ behavior. The authors formulate a SPH-based likelihood with latent scales and design priors for regression coefficients (ridge or spike-and-slab) and the robustness parameter $\alpha$, enabling uncertainty quantification without ad hoc tuning. They prove posterior-consistency results in both low- and high-dimensional regimes and demonstrate strong sparsity-pattern recovery under mild dependence assumptions, complemented by extensive simulations and a GDP forecast application. Empirically, SPH matches or surpasses traditional $\ell_1$/$\ell_2$ methods across heavy, moderate, and thin-tailed data, while offering calibrated uncertainty and robust variable selection in the presence of serial correlation and contamination. The practical impact lies in providing a scalable, robust Bayesian tool for high-dimensional regression in time-series and econometric contexts where outliers and dependence are pervasive.
Abstract
This paper introduces a loss-based generalized Bayesian methodology for high-dimensional robust regression with serially correlated errors and predictors. The proposed framework employs a novel scaled pseudo-Huber (SPH) loss function, which smooths the well-known Huber loss, effectively balancing quadratic ($\ell_2$) and absolute linear ($\ell_1$) loss behaviors. This flexibility enables the framework to accommodate both thin-tailed and heavy-tailed data efficiently. The generalized Bayesian approach constructs a working likelihood based on the SPH loss, facilitating efficient and stable estimation while providing rigorous uncertainty quantification for all model parameters. Notably, this approach allows formal statistical inference without requiring ad hoc tuning parameter selection while adaptively addressing a wide range of tail behavior in the errors. By specifying appropriate prior distributions for the regression coefficients--such as ridge priors for small or moderate-dimensional settings and spike-and-slab priors for high-dimensional settings--the framework ensures principled inference. We establish rigorous theoretical guarantees for accurate parameter estimation and correct predictor selection under sparsity assumptions for a wide range of data generating setups. Extensive simulation studies demonstrate the superior performance of our approach compared to traditional Bayesian regression methods based on $\ell_2$ and $\ell_1$-loss functions. The results highlight its flexibility and robustness, particularly in challenging high-dimensional settings characterized by data contamination.
