Adaptive Robust Learning using Latent Bernoulli Variables
Aleksandr Karakulev, Dave Zachariah, Prashant Singh
TL;DR
This work tackles learning from datasets with arbitrary corruption by introducing RLVI, which uses latent Bernoulli indicators $t_i$ to separate corrupted from clean samples under the $\varepsilon$-contaminated model. By applying variational inference, RLVI marginalizes over $t_i$ and optimizes an evidence lower bound (ELBO), yielding sample weights $\pi_i$ that adjust the loss and automatically estimate the corruption level via $\varepsilon = 1 - \frac{1}{n}\sum_i \pi_i$. The method supports standard ML, online learning, and deep networks through a stochastic EM variant and a truncation-based regularization to prevent overfitting in overparameterized models. Empirical results across linear/logistic regression, PCA, online HAR, and CNNs on corrupted data show that RLVI achieves higher accuracy or recall than state-of-the-art robust methods, with only modest computational overhead and without explicit tuning of the noise level. Overall, RLVI offers a parameter-free, scalable approach to robust learning with broad applicability to practical noisy-data settings.
Abstract
We present an adaptive approach for robust learning from corrupted training sets. We identify corrupted and non-corrupted samples with latent Bernoulli variables and thus formulate the learning problem as maximization of the likelihood where latent variables are marginalized. The resulting problem is solved via variational inference, using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level, while adding minimal computational overhead. We demonstrate our robust learning method and its parameter-free nature on a wide variety of machine learning tasks including online learning and deep learning where it adapts to different levels of noise and maintains high prediction accuracy.
