Outlier-Robust Training of Machine Learning Models

Rajat Talak; Charis Georgiou; Jingnan Shi; Luca Carlone

Outlier-Robust Training of Machine Learning Models

Rajat Talak, Charis Georgiou, Jingnan Shi, Luca Carlone

TL;DR

This work addresses training ML models in the presence of arbitrary outliers by bridging two robustness paradigms—robust estimation (M-estimation) and risk-minimization in deep learning—via a modified Black-Rangarajan duality. It defines a unified robust loss kernel σ and derives the Adaptive Alternation Algorithm (AAA), which alternates between weighted loss minimization and adaptive coefficient updates with a data-driven, hyperparameter-free mechanism. The authors prove that the robust kernel expands the region of convergence and reduces gradient variance under outliers, and validate the approach on linear regression, image classification with noisy labels, and neural scene reconstruction, including NeRF-style experiments with up to 80% outliers. The paper also discusses connections to conformal prediction and graduated non-convexity, and provides release code for reproducibility, highlighting practical impact for robust training across domains.

Abstract

Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using a risk-minimization framework, which is popular in deep learning. We first show that a simple modification of the Black-Rangarajan duality provides a unifying view. The modified duality brings out a definition of a robust loss kernel $σ$ that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels σ increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.

Outlier-Robust Training of Machine Learning Models

TL;DR

Abstract

that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels σ increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.

Paper Structure (36 sections, 11 theorems, 68 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 36 sections, 11 theorems, 68 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Contribution
Organization
Background: Diverging Principles of Robust Loss Design
Robust Estimation in Robotics and Computer Vision
Training Deep Learning Models in the Presence of Outliers
Problem Statement
Unified Robust Loss Kernel
Modified Black-Rangarajan Duality
Unified Robust Loss Kernel
Adaptive Alternation Algorithm
Alternation Algorithm
Parameter Update
Theoretical Analysis
Assumption on Outliers
...and 21 more sections

Key Result

Theorem 1

The robust estimation problem eq:m-est is equivalent to the weighted non-linear least squares problem eq:weighted-nlse with $\Psi_{\rho}(u) = - u (\phi')^{-1}(u) + \phi( (\phi')^{-1}(u))$ and $\phi(r) = \rho(\sqrt{r})$, provided $\phi(r)$ satisfies: (i) $\phi'(r) \rightarrow 1$ as $r \downarrow 0$,

Figures (6)

Figure 1: Nerfacto Tancik23siggraph-nerfstudio reconstruction results after $80\%$ of the training pixels have been perturbed by outliers. (left) Training with the original Adam optimizer. (middle) Training with our Adaptive Alternation Algorithm with Truncated Loss. (right) Ground truth.
Figure 2: Trajectory of (a) SGD (batch size = 1), (b) Adaptive Alternation Algorithm with Truncated Loss (batch size = 1), and (c) Gradient Descent, for a linear regression problem with zero-mean outliers. The presence of outliers in the training data introduces large perturbations into SGD. Our algorithm stabilizes the descent and the variance in the gradient estimate is lower (Lemma \ref{['lem:training-algo-variance']}). We observe its behavior to be close to the full gradient descent, where the gradient estimate is exact, given zero-mean outliers.
Figure 3: (a) Test accuracy (i.e., RMSE on test data) as a function of outlier fraction $\lambda\xspace$ in the training data. The figure shows the gradient descent (GD) algorithm, stochastic gradient descent (SGD) algorithm, and two adaptive alternation algorithms Adaptive GM and Adaptive TL. (b) Test classification accuracy as a function of outlier fraction $\lambda\xspace$ in the training data. The figure shows SGD, Normalized Gradient Descent, Gradient Clipping, and the three adaptive alternation algorithm s Adaptive GM, Adaptive TL, and Adaptive-T GM.
Figure 4: Test accuracy (PSNR $\uparrow$ and LPIPS $\downarrow$) of the trained model as a function of % outliers in the training data for various training algorithms: (i) Adam / SGD, the baseline approach proposed for training without outliers; (ii) Gradient Clipping, (iii) Normalized Gradient, (iv) Adaptive TL, (v) Adaptive GM, and (vi) Adaptive-T GM.
Figure 5: Plot of the 1D training loss landscape as interpolated between the Adaptive TL model weight and the vanilla Adam model weights.
...and 1 more figures

Theorems & Definitions (32)

Theorem 1: Black96ijcv-unification
Remark 2: Risk Minimization Framework and Robust Losses
Remark 3: Convergence and Robust Loss Design
Corollary 4: Modified Black-Rangarajan Duality
proof
Remark 5: Dual Problem Structure and its Application
Definition 6: Robust Loss Kernel $\sigma$
Lemma 7
Remark 8: Parameter Update and Graduated Non-Convexity
Remark 9: Iteratively Trimmed Loss Minimization
...and 22 more

Outlier-Robust Training of Machine Learning Models

TL;DR

Abstract

Outlier-Robust Training of Machine Learning Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (32)