Table of Contents
Fetching ...

Adaptive Robust Learning using Latent Bernoulli Variables

Aleksandr Karakulev, Dave Zachariah, Prashant Singh

TL;DR

This work tackles learning from datasets with arbitrary corruption by introducing RLVI, which uses latent Bernoulli indicators $t_i$ to separate corrupted from clean samples under the $\varepsilon$-contaminated model. By applying variational inference, RLVI marginalizes over $t_i$ and optimizes an evidence lower bound (ELBO), yielding sample weights $\pi_i$ that adjust the loss and automatically estimate the corruption level via $\varepsilon = 1 - \frac{1}{n}\sum_i \pi_i$. The method supports standard ML, online learning, and deep networks through a stochastic EM variant and a truncation-based regularization to prevent overfitting in overparameterized models. Empirical results across linear/logistic regression, PCA, online HAR, and CNNs on corrupted data show that RLVI achieves higher accuracy or recall than state-of-the-art robust methods, with only modest computational overhead and without explicit tuning of the noise level. Overall, RLVI offers a parameter-free, scalable approach to robust learning with broad applicability to practical noisy-data settings.

Abstract

We present an adaptive approach for robust learning from corrupted training sets. We identify corrupted and non-corrupted samples with latent Bernoulli variables and thus formulate the learning problem as maximization of the likelihood where latent variables are marginalized. The resulting problem is solved via variational inference, using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level, while adding minimal computational overhead. We demonstrate our robust learning method and its parameter-free nature on a wide variety of machine learning tasks including online learning and deep learning where it adapts to different levels of noise and maintains high prediction accuracy.

Adaptive Robust Learning using Latent Bernoulli Variables

TL;DR

This work tackles learning from datasets with arbitrary corruption by introducing RLVI, which uses latent Bernoulli indicators to separate corrupted from clean samples under the -contaminated model. By applying variational inference, RLVI marginalizes over and optimizes an evidence lower bound (ELBO), yielding sample weights that adjust the loss and automatically estimate the corruption level via . The method supports standard ML, online learning, and deep networks through a stochastic EM variant and a truncation-based regularization to prevent overfitting in overparameterized models. Empirical results across linear/logistic regression, PCA, online HAR, and CNNs on corrupted data show that RLVI achieves higher accuracy or recall than state-of-the-art robust methods, with only modest computational overhead and without explicit tuning of the noise level. Overall, RLVI offers a parameter-free, scalable approach to robust learning with broad applicability to practical noisy-data settings.

Abstract

We present an adaptive approach for robust learning from corrupted training sets. We identify corrupted and non-corrupted samples with latent Bernoulli variables and thus formulate the learning problem as maximization of the likelihood where latent variables are marginalized. The resulting problem is solved via variational inference, using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level, while adding minimal computational overhead. We demonstrate our robust learning method and its parameter-free nature on a wide variety of machine learning tasks including online learning and deep learning where it adapts to different levels of noise and maintains high prediction accuracy.
Paper Structure (8 sections, 34 equations, 41 figures, 4 tables, 2 algorithms)

This paper contains 8 sections, 34 equations, 41 figures, 4 tables, 2 algorithms.

Figures (41)

  • Figure 1: Classification of online streaming data with varying number of corrupted labels. Adaptive nature of our approach (RLVI) allows for automatic identification of outliers when learning from batches of data with different $\varepsilon$. Our method is robust and thus has higher accuracy than the standard stochastic optimization of the likelihood (SGD).
  • Figure 2: Box plots of relative errors for a fixed value of corruption level $\varepsilon$. Left. Linear regression: relative errors ${\| \widehat{\bm{\theta}}-\bm{\theta}^{\star} \|_2 / \| \bm{\theta}^{\star} \|_2}$. Middle. Logistic regression: angle in degrees between the true separating hyperplane $\bm{\theta}^{\star}$ and estimates $\widehat{\bm{\theta}}$. Right. PCA: misalignment errors $1 - |\cos{( \widehat{\bm{\theta}}\,^\top \bm{\theta}^{\star} )}|$ for the subspace spanned by the first principal component. Each box spans the 25th to 75th quantiles; red dots depict the means. For all plots, 100 Monte Carlo runs are used.
  • Figure 3: Linear regression. Average relative error versus varying corruption level $\varepsilon$; 100 Monte Carlo runs are used.
  • Figure 4: Online classification. Left. Distribution of corruption level across batches of streaming data. Right. Recall (true positive rate) on left-out data versus total number of observed samples (smoothed with a moving average filter).
  • Figure 5: Image classification on CIFAR10 corrupted with synthetic noise (pairflip, $\varepsilon = 45\%$). Top. Accuracy on the clean testing set for standard SGD and RLVI. Bottom. Percentage of corrupted and non-corrupted samples correctly identified with the decision boundary ${\pi_i < \tau}$. In both plots, the dashed line corresponds to RLVI with no regularization -- solid line indicates that truncation (${\pi_i < \tau \implies \pi_i \leftarrow 0}$) is used for the terms in anti-gradient updates. Threshold $\tau$ is computed from \ref{['eq:error-ineq']}. Regularization based on the bounded type II error makes differentiation of corrupted samples more effective, ultimately improving test accuracy for the overparameterized setting.
  • ...and 36 more figures

Theorems & Definitions (1)

  • proof