Table of Contents
Fetching ...

Outlier-Robust Training of Machine Learning Models

Rajat Talak, Charis Georgiou, Jingnan Shi, Luca Carlone

TL;DR

This work addresses training ML models in the presence of arbitrary outliers by bridging two robustness paradigms—robust estimation (M-estimation) and risk-minimization in deep learning—via a modified Black-Rangarajan duality. It defines a unified robust loss kernel σ and derives the Adaptive Alternation Algorithm (AAA), which alternates between weighted loss minimization and adaptive coefficient updates with a data-driven, hyperparameter-free mechanism. The authors prove that the robust kernel expands the region of convergence and reduces gradient variance under outliers, and validate the approach on linear regression, image classification with noisy labels, and neural scene reconstruction, including NeRF-style experiments with up to 80% outliers. The paper also discusses connections to conformal prediction and graduated non-convexity, and provides release code for reproducibility, highlighting practical impact for robust training across domains.

Abstract

Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using a risk-minimization framework, which is popular in deep learning. We first show that a simple modification of the Black-Rangarajan duality provides a unifying view. The modified duality brings out a definition of a robust loss kernel $σ$ that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels σ increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.

Outlier-Robust Training of Machine Learning Models

TL;DR

This work addresses training ML models in the presence of arbitrary outliers by bridging two robustness paradigms—robust estimation (M-estimation) and risk-minimization in deep learning—via a modified Black-Rangarajan duality. It defines a unified robust loss kernel σ and derives the Adaptive Alternation Algorithm (AAA), which alternates between weighted loss minimization and adaptive coefficient updates with a data-driven, hyperparameter-free mechanism. The authors prove that the robust kernel expands the region of convergence and reduces gradient variance under outliers, and validate the approach on linear regression, image classification with noisy labels, and neural scene reconstruction, including NeRF-style experiments with up to 80% outliers. The paper also discusses connections to conformal prediction and graduated non-convexity, and provides release code for reproducibility, highlighting practical impact for robust training across domains.

Abstract

Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using a risk-minimization framework, which is popular in deep learning. We first show that a simple modification of the Black-Rangarajan duality provides a unifying view. The modified duality brings out a definition of a robust loss kernel that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels σ increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.
Paper Structure (36 sections, 11 theorems, 68 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 36 sections, 11 theorems, 68 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

The robust estimation problem eq:m-est is equivalent to the weighted non-linear least squares problem eq:weighted-nlse with $\Psi_{\rho}(u) = - u (\phi')^{-1}(u) + \phi( (\phi')^{-1}(u))$ and $\phi(r) = \rho(\sqrt{r})$, provided $\phi(r)$ satisfies: (i) $\phi'(r) \rightarrow 1$ as $r \downarrow 0$,

Figures (6)

  • Figure 1: Nerfacto Tancik23siggraph-nerfstudio reconstruction results after $80\%$ of the training pixels have been perturbed by outliers. (left) Training with the original Adam optimizer. (middle) Training with our Adaptive Alternation Algorithm with Truncated Loss. (right) Ground truth.
  • Figure 2: Trajectory of (a) SGD (batch size = 1), (b) Adaptive Alternation Algorithm with Truncated Loss (batch size = 1), and (c) Gradient Descent, for a linear regression problem with zero-mean outliers. The presence of outliers in the training data introduces large perturbations into SGD. Our algorithm stabilizes the descent and the variance in the gradient estimate is lower (Lemma \ref{['lem:training-algo-variance']}). We observe its behavior to be close to the full gradient descent, where the gradient estimate is exact, given zero-mean outliers.
  • Figure 3: (a) Test accuracy (i.e., RMSE on test data) as a function of outlier fraction $\lambda\xspace$ in the training data. The figure shows the gradient descent (GD) algorithm, stochastic gradient descent (SGD) algorithm, and two adaptive alternation algorithms Adaptive GM and Adaptive TL. (b) Test classification accuracy as a function of outlier fraction $\lambda\xspace$ in the training data. The figure shows SGD, Normalized Gradient Descent, Gradient Clipping, and the three adaptive alternation algorithm s Adaptive GM, Adaptive TL, and Adaptive-T GM.
  • Figure 4: Test accuracy (PSNR $\uparrow$ and LPIPS $\downarrow$) of the trained model as a function of % outliers in the training data for various training algorithms: (i) Adam / SGD, the baseline approach proposed for training without outliers; (ii) Gradient Clipping, (iii) Normalized Gradient, (iv) Adaptive TL, (v) Adaptive GM, and (vi) Adaptive-T GM.
  • Figure 5: Plot of the 1D training loss landscape as interpolated between the Adaptive TL model weight and the vanilla Adam model weights.
  • ...and 1 more figures

Theorems & Definitions (32)

  • Theorem 1: Black96ijcv-unification
  • Remark 2: Risk Minimization Framework and Robust Losses
  • Remark 3: Convergence and Robust Loss Design
  • Corollary 4: Modified Black-Rangarajan Duality
  • proof
  • Remark 5: Dual Problem Structure and its Application
  • Definition 6: Robust Loss Kernel $\sigma$
  • Lemma 7
  • Remark 8: Parameter Update and Graduated Non-Convexity
  • Remark 9: Iteratively Trimmed Loss Minimization
  • ...and 22 more