Outlier-Robust Training of Machine Learning Models
Rajat Talak, Charis Georgiou, Jingnan Shi, Luca Carlone
TL;DR
This work addresses training ML models in the presence of arbitrary outliers by bridging two robustness paradigms—robust estimation (M-estimation) and risk-minimization in deep learning—via a modified Black-Rangarajan duality. It defines a unified robust loss kernel σ and derives the Adaptive Alternation Algorithm (AAA), which alternates between weighted loss minimization and adaptive coefficient updates with a data-driven, hyperparameter-free mechanism. The authors prove that the robust kernel expands the region of convergence and reduces gradient variance under outliers, and validate the approach on linear regression, image classification with noisy labels, and neural scene reconstruction, including NeRF-style experiments with up to 80% outliers. The paper also discusses connections to conformal prediction and graduated non-convexity, and provides release code for reproducibility, highlighting practical impact for robust training across domains.
Abstract
Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using a risk-minimization framework, which is popular in deep learning. We first show that a simple modification of the Black-Rangarajan duality provides a unifying view. The modified duality brings out a definition of a robust loss kernel $σ$ that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels σ increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.
