Table of Contents
Fetching ...

Multiplicative Reweighting for Robust Neural Network Optimization

Noga Bar, Tomer Koren, Raja Giryes

TL;DR

This work introduces Multiplicative Reweighting (MR), a plug-in optimization technique that uses multiplicative weights to reweight training examples during neural network optimization. By treating each example as an expert and updating a distribution $p\in\Delta_N$ over examples based on observed losses, MR downweights noisy data while updating model parameters $\theta$ with a weighted empirical loss $F(\theta,p)=\sum_i p_i\ell_i(\theta)$. The authors prove convergence of MR with gradient-based methods and provide 1d label-noise guarantees, then demonstrate empirical gains on CIFAR-10/100 and Clothing1M under synthetic and real label noise, as well as improved adversarial robustness when MR is combined with established defenses. MR incurs modest training-time overhead and integrates easily with common optimizers like SGD and Adam, offering a practical toolkit addition for improving robustness to label noise and adversarial perturbations.

Abstract

Neural networks are widespread due to their powerful performance. Yet, they degrade in the presence of noisy labels at training time. Inspired by the setting of learning with expert advice, where multiplicative weights (MW) updates were recently shown to be robust to moderate data corruptions in expert advice, we propose to use MW for reweighting examples during neural networks optimization. We theoretically establish the convergence of our method when used with gradient descent and prove its advantages in 1d cases. We then validate empirically our findings for the general case by showing that MW improves neural networks' accuracy in the presence of label noise on CIFAR-10, CIFAR-100 and Clothing1M. We also show the impact of our approach on adversarial robustness.

Multiplicative Reweighting for Robust Neural Network Optimization

TL;DR

This work introduces Multiplicative Reweighting (MR), a plug-in optimization technique that uses multiplicative weights to reweight training examples during neural network optimization. By treating each example as an expert and updating a distribution over examples based on observed losses, MR downweights noisy data while updating model parameters with a weighted empirical loss . The authors prove convergence of MR with gradient-based methods and provide 1d label-noise guarantees, then demonstrate empirical gains on CIFAR-10/100 and Clothing1M under synthetic and real label noise, as well as improved adversarial robustness when MR is combined with established defenses. MR incurs modest training-time overhead and integrates easily with common optimizers like SGD and Adam, offering a practical toolkit addition for improving robustness to label noise and adversarial perturbations.

Abstract

Neural networks are widespread due to their powerful performance. Yet, they degrade in the presence of noisy labels at training time. Inspired by the setting of learning with expert advice, where multiplicative weights (MW) updates were recently shown to be robust to moderate data corruptions in expert advice, we propose to use MW for reweighting examples during neural networks optimization. We theoretically establish the convergence of our method when used with gradient descent and prove its advantages in 1d cases. We then validate empirically our findings for the general case by showing that MW improves neural networks' accuracy in the presence of label noise on CIFAR-10, CIFAR-100 and Clothing1M. We also show the impact of our approach on adversarial robustness.

Paper Structure

This paper contains 27 sections, 22 theorems, 57 equations, 7 figures, 13 tables, 4 algorithms.

Key Result

Lemma 1

For a $\beta$-smooth loss $\ell(\cdot)$, and $\theta_{t+1}$, $p_{t+1}$ updated as in alg:gd_mw with GD step size of $\alpha=\frac{1}{\beta}$ and MW step size $\eta > 0$

Figures (7)

  • Figure 1: Evolution of the multiplicative weights sum for clean and noisy examples with 20% and 40% label noise. The sum of all weights is 1. Note how the noisy examples weight decreases through the training and thus they less affect the network.
  • Figure 1: Loss evolution of the 1d illustrative examples with and without our MR technique.
  • Figure 1: Loss of examples with the $2\%$ highest and lowest weights. Left and right are training with 40% and 20% noise levels respectively.
  • Figure 1: Fraction of examples with noisy labels along weighting percentiles. Trained with CIFAR10 and (a), (b) are training with 40% and 20% noise levels respectively.
  • Figure 2: Simulations of Lipschitzness property of MR.
  • ...and 2 more figures

Theorems & Definitions (36)

  • Lemma 1: Equivalence to Descent Lemma
  • Theorem 2: convergence
  • Corollary 3: convergence rate
  • Theorem 4: SGD convergence
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Theorem 8
  • Theorem 9
  • Lemma 10
  • ...and 26 more