Table of Contents
Fetching ...

May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

Monica Millunzi, Lorenzo Bonicelli, Angelo Porrello, Jacopo Credi, Petter N. Kolm, Simone Calderara

TL;DR

The paper tackles continual learning with noisy labels in streaming data, where forgetting degrades memory quality. It introduces Alternate Experience Replay (AER) to exploit forgetting and separate clean from noisy/complex samples, and Asymmetric Balanced Sampling (ABS) to maintain current-task purity while preserving informative past samples; a buffer consolidation step using MixMatch further refines the memory. Across multiple benchmarks with synthetic and real noise, the approach yields consistent accuracy gains, notably an average improvement of 4.71 percentage points over loss-based purification baselines, and shows substantial speed advantages over competing online CLN methods. The method demonstrates strong robustness and applicability to online, multi-epoch settings, offering practical benefits for real-world learning with noisy annotations.

Abstract

Forgetting presents a significant challenge during incremental training, making it particularly demanding for contemporary AI systems to assimilate new knowledge in streaming data environments. To address this issue, most approaches in Continual Learning (CL) rely on the replay of a restricted buffer of past data. However, the presence of noise in real-world scenarios, where human annotation is constrained by time limitations or where data is automatically gathered from the web, frequently renders these strategies vulnerable. In this study, we address the problem of CL under Noisy Labels (CLN) by introducing Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer. The idea is that complex or mislabeled examples, which hardly fit the previously learned data distribution, are most likely to be forgotten. To grasp the benefits of such a separation, we equip AER with Asymmetric Balanced Sampling (ABS): a new sample selection strategy that prioritizes purity on the current task while retaining relevant samples from the past. Through extensive computational comparisons, we demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies. Code is available at https://github.com/aimagelab/mammoth.

May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

TL;DR

The paper tackles continual learning with noisy labels in streaming data, where forgetting degrades memory quality. It introduces Alternate Experience Replay (AER) to exploit forgetting and separate clean from noisy/complex samples, and Asymmetric Balanced Sampling (ABS) to maintain current-task purity while preserving informative past samples; a buffer consolidation step using MixMatch further refines the memory. Across multiple benchmarks with synthetic and real noise, the approach yields consistent accuracy gains, notably an average improvement of 4.71 percentage points over loss-based purification baselines, and shows substantial speed advantages over competing online CLN methods. The method demonstrates strong robustness and applicability to online, multi-epoch settings, offering practical benefits for real-world learning with noisy annotations.

Abstract

Forgetting presents a significant challenge during incremental training, making it particularly demanding for contemporary AI systems to assimilate new knowledge in streaming data environments. To address this issue, most approaches in Continual Learning (CL) rely on the replay of a restricted buffer of past data. However, the presence of noise in real-world scenarios, where human annotation is constrained by time limitations or where data is automatically gathered from the web, frequently renders these strategies vulnerable. In this study, we address the problem of CL under Noisy Labels (CLN) by introducing Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer. The idea is that complex or mislabeled examples, which hardly fit the previously learned data distribution, are most likely to be forgotten. To grasp the benefits of such a separation, we equip AER with Asymmetric Balanced Sampling (ABS): a new sample selection strategy that prioritizes purity on the current task while retaining relevant samples from the past. Through extensive computational comparisons, we demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies. Code is available at https://github.com/aimagelab/mammoth.
Paper Structure (19 sections, 6 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 19 sections, 6 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Training loss of clean and noisy during the second task of Seq. CIFAR-10 with $40\%$ noise. The loss is computed on examples from the first task stored in the memory buffer. Standard replay makes the two indistinguishable (left) but alternating epochs of replay and forgetting maintain a significant loss separation (right).
  • Figure 2: Asymmetric Balanced Sampling (ABS). Past examples are chosen to retain the most complex ones, while the criterion is reversed for the current task to maximize purity.
  • Figure 3: FAA ($[\uparrow]$) of DER++ with our method and buffer fitting.
  • Figure 4: Final composition of the buffer with different choices of sample selection.
  • Figure A: Effect of AER on the speed at which the model learns the noisy data