Provably robust learning of regression neural networks using $β$-divergences

Abhik Ghosh; Suryasis Jana

Provably robust learning of regression neural networks using $β$-divergences

Abhik Ghosh, Suryasis Jana

TL;DR

This work addresses the vulnerability of regression neural networks to outliers and data contamination by introducing rRNet, a robust learning framework based on the β-divergence (Density Power Divergence). It extends minimum DPD estimation to neural networks, accommodating non-smooth activations and general error densities, with convergence guarantees for an alternating optimization scheme. The paper establishes local robustness via bounded influence functions for all β>0 and a strong global robustness guarantee with a 50% asymptotic breakdown point for β∈(0,1], while showing unrobustness of ML-based training (β=0). Through simulations and real-data experiments, rRNet demonstrates improved stability and predictive performance over existing robust methods, validating its practical impact for regression tasks under contamination and noise.

Abstract

Regression neural networks (NNs) are most commonly trained by minimizing the mean squared prediction error, which is highly sensitive to outliers and data contamination. Existing robust training methods for regression NNs are often limited in scope and rely primarily on empirical validation, with only a few offering partial theoretical guarantees. In this paper, we propose a new robust learning framework for regression NNs based on the $β$-divergence (also known as the density power divergence) which we call `rRNet'. It applies to a broad class of regression NNs, including models with non-smooth activation functions and error densities, and recovers the classical maximum likelihood learning as a special case. The rRNet is implemented via an alternating optimization scheme, for which we establish convergence guarantees to stationary points under mild, verifiable conditions. The (local) robustness of rRNet is theoretically characterized through the influence functions of both the parameter estimates and the resulting rRNet predictor, which are shown to be bounded for suitable choices of the tuning parameter $β$, depending on the error density. We further prove that rRNet attains the optimal 50\% asymptotic breakdown point at the assumed model for all $β\in(0, 1]$, providing a strong global robustness guarantee that is largely absent for existing NN learning methods. Our theoretical results are complemented by simulation experiments and real-data analyses, illustrating practical advantages of rRNet over existing approaches in both function approximation problems and prediction tasks with noisy observations.

Provably robust learning of regression neural networks using $β$-divergences

TL;DR

Abstract

-divergence (also known as the density power divergence) which we call `rRNet'. It applies to a broad class of regression NNs, including models with non-smooth activation functions and error densities, and recovers the classical maximum likelihood learning as a special case. The rRNet is implemented via an alternating optimization scheme, for which we establish convergence guarantees to stationary points under mild, verifiable conditions. The (local) robustness of rRNet is theoretically characterized through the influence functions of both the parameter estimates and the resulting rRNet predictor, which are shown to be bounded for suitable choices of the tuning parameter

, depending on the error density. We further prove that rRNet attains the optimal 50\% asymptotic breakdown point at the assumed model for all

, providing a strong global robustness guarantee that is largely absent for existing NN learning methods. Our theoretical results are complemented by simulation experiments and real-data analyses, illustrating practical advantages of rRNet over existing approaches in both function approximation problems and prediction tasks with noisy observations.

Paper Structure (33 sections, 13 theorems, 79 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 33 sections, 13 theorems, 79 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
The rRNet: Robust learning of regression neural networks
Regression NN: Model setup and notation
DPD-based robust learning via rRNet
Population-Level Target and Identifiability of rRNet
Optimization dynamics of the rRNet
Convergence guarantees of Algorithm \ref{['Alg-dpd-nn']}
The sub-optimization problem with respect to $\boldsymbol{\theta}$
The sub-optimization problem with respect to $\sigma$
Local robustness guarantees: Influence functions
IFs under smooth activation functions and error densities
Extending the IF for non-smooth models
Global robustness guarantees: Breakdown Analysis
Empirical illustrations
An implementation of the rRNet for Gaussian noise
...and 18 more sections

Key Result

Theorem 2.1

Suppose Assumptions (N0) and (A0) hold for a given $\beta \geq 0$. Then, in either the fixed or random design case, the (conditional) population-level DPD-loss $\mathcal{L}_{n,\beta}^*(\bm{\theta},\sigma)$, defined in dpd-loss-pop, satisfies In particular, the population-level rRNet objective admits a unique minimizer in function space (the target rRNet predictor $\mu_{n,\beta}^*$), together with

Figures (9)

Figure 1: The fully connected MLP with 2 hidden layers with the respective number of hidden nodes being $K_1, K_2$, and the activation functions being $\phi_1, \phi_2$. The output layer has linear activation.
Figure 2: IFs of the MDPDFs and the rRNet predictors for a simple MLP, with sigmoid activation and Gaussian error, under contamination in the 2nd observation [The case $\beta=0$ represents the standard LSE based training]
Figure 3: IFs of the MDPDFs and the rRNet predictors for a simple MLP, with ReLU activation and Gaussian error, under contamination in the 2nd observation [The case $\beta=0$ represents the standard LSE based training]
Figure 4: Plots of the Functions 1-6 ($\varphi_1,\ldots,\varphi_6$), along with an instance of simulated dataset with illustrative contamination
Figure S1: IFs of the MDPDFs and the rRNet predictors for a simple MLP, with sigmoid activation and Gaussian error, under contamination in the 49-th observation [The case $\beta=0$ represents the standard LSE based training]
...and 4 more figures

Theorems & Definitions (18)

Example 2.1: Gaussian error case
Remark 2.1
Theorem 2.1: Identifiability of the rRNet objective
Theorem 3.1
Proposition 3.1
Theorem 4.1
Corollary 4.1.1
Corollary 4.1.2
Example 4.1: Shallow NN with Gaussian error and sigmoid activation
Corollary 4.1.3
...and 8 more

Provably robust learning of regression neural networks using $β$-divergences

TL;DR

Abstract

Provably robust learning of regression neural networks using $β$-divergences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (18)