Online and Offline Robust Multivariate Linear Regression
Antoine Godichon-Baggioni, Stephane S. Robin, Laure Sansonnet
TL;DR
This work develops robust estimation methods for multivariate Gaussian linear regression under both Euclidean and Mahalanobis loss criteria. It introduces online stochastic gradient algorithms (with averaging) and offline fixed-point algorithms for each loss, together with ridge-regularized variants, and proves convergence and asymptotic normality under weak conditions. A practical strategy is provided for unknown noise covariance $\\Sigma$ via Median Covariation Matrix (MCM) estimation and eigen-decomposition to yield a robust inverse covariance, enabling Mahalanobis-based robust regression in practice. Through extensive simulations, the authors demonstrate substantial robustness gains over classical least-squares across varying contamination levels and dimensions, and show favorable computational efficiency for online methods; all methods are implemented in the R package RobRegression. The theoretical results, empirical findings, and scalable online/offline algorithms offer a principled toolkit for robust multivariate regression and discriminant analysis in streaming contexts.
Abstract
We consider the robust estimation of the parameters of multivariate Gaussian linear regression models. To this aim we consider robust version of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. We introduce two methods each considered contrast: (i) online stochastic gradient descent algorithms and their averaged versions and (ii) offline fix-point algorithms. Under weak assumptions, we prove the asymptotic normality of the resulting estimates. Because the variance matrix of the noise is usually unknown, we propose to plug a robust estimate of it in the Mahalanobis-based stochastic gradient descent algorithms. We show, on synthetic data, the dramatic gain in terms of robustness of the proposed estimates as compared to the classical least-square ones. Well also show the computational efficiency of the online versions of the proposed algorithms. All the proposed algorithms are implemented in the R package RobRegression available on CRAN.
