Provably robust learning of regression neural networks using $β$-divergences
Abhik Ghosh, Suryasis Jana
TL;DR
This work addresses the vulnerability of regression neural networks to outliers and data contamination by introducing rRNet, a robust learning framework based on the β-divergence (Density Power Divergence). It extends minimum DPD estimation to neural networks, accommodating non-smooth activations and general error densities, with convergence guarantees for an alternating optimization scheme. The paper establishes local robustness via bounded influence functions for all β>0 and a strong global robustness guarantee with a 50% asymptotic breakdown point for β∈(0,1], while showing unrobustness of ML-based training (β=0). Through simulations and real-data experiments, rRNet demonstrates improved stability and predictive performance over existing robust methods, validating its practical impact for regression tasks under contamination and noise.
Abstract
Regression neural networks (NNs) are most commonly trained by minimizing the mean squared prediction error, which is highly sensitive to outliers and data contamination. Existing robust training methods for regression NNs are often limited in scope and rely primarily on empirical validation, with only a few offering partial theoretical guarantees. In this paper, we propose a new robust learning framework for regression NNs based on the $β$-divergence (also known as the density power divergence) which we call `rRNet'. It applies to a broad class of regression NNs, including models with non-smooth activation functions and error densities, and recovers the classical maximum likelihood learning as a special case. The rRNet is implemented via an alternating optimization scheme, for which we establish convergence guarantees to stationary points under mild, verifiable conditions. The (local) robustness of rRNet is theoretically characterized through the influence functions of both the parameter estimates and the resulting rRNet predictor, which are shown to be bounded for suitable choices of the tuning parameter $β$, depending on the error density. We further prove that rRNet attains the optimal 50\% asymptotic breakdown point at the assumed model for all $β\in(0, 1]$, providing a strong global robustness guarantee that is largely absent for existing NN learning methods. Our theoretical results are complemented by simulation experiments and real-data analyses, illustrating practical advantages of rRNet over existing approaches in both function approximation problems and prediction tasks with noisy observations.
