Table of Contents
Fetching ...

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

Reo Fukunaga, Soh Yoshida, Mitsuji Muneyasu

TL;DR

ACD-U proposes asymmetric co-teaching with different architectures (ACD)-U, an asymmetric co-teaching framework that uses different model architectures and incorporates machine unlearning that shifts the learning paradigm from passive error avoidance to active error correction.

Abstract

Deep neural networks are prone to memorizing incorrect labels during training, which degrades their generalizability. Although recent methods have combined sample selection with semi-supervised learning (SSL) to exploit the memorization effect -- where networks learn from clean data before noisy data -- they cannot correct selection errors once a sample is misclassified. To overcome this, we propose asymmetric co-teaching with different architectures (ACD)-U, an asymmetric co-teaching framework that uses different model architectures and incorporates machine unlearning. ACD-U addresses this limitation through two core mechanisms. First, its asymmetric co-teaching pairs a contrastive language-image pretraining (CLIP)-pretrained vision Transformer with a convolutional neural network (CNN), leveraging their complementary learning behaviors: the pretrained model provides stable predictions, whereas the CNN adapts throughout training. This asymmetry, where the vision Transformer is trained only on clean samples and the CNN is trained through SSL, effectively mitigates confirmation bias. Second, selective unlearning enables post-hoc error correction by identifying incorrectly memorized samples through loss trajectory analysis and CLIP consistency checks, and then removing their influence via Kullback--Leibler divergence-based forgetting. This approach shifts the learning paradigm from passive error avoidance to active error correction. Experiments on synthetic and real-world noisy datasets, including CIFAR-10/100, CIFAR-N, WebVision, Clothing1M, and Red Mini-ImageNet, demonstrate state-of-the-art performance, particularly in high-noise regimes and under instance-dependent noise. The code is publicly available at https://github.com/meruemon/ACD-U.

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

TL;DR

ACD-U proposes asymmetric co-teaching with different architectures (ACD)-U, an asymmetric co-teaching framework that uses different model architectures and incorporates machine unlearning that shifts the learning paradigm from passive error avoidance to active error correction.

Abstract

Deep neural networks are prone to memorizing incorrect labels during training, which degrades their generalizability. Although recent methods have combined sample selection with semi-supervised learning (SSL) to exploit the memorization effect -- where networks learn from clean data before noisy data -- they cannot correct selection errors once a sample is misclassified. To overcome this, we propose asymmetric co-teaching with different architectures (ACD)-U, an asymmetric co-teaching framework that uses different model architectures and incorporates machine unlearning. ACD-U addresses this limitation through two core mechanisms. First, its asymmetric co-teaching pairs a contrastive language-image pretraining (CLIP)-pretrained vision Transformer with a convolutional neural network (CNN), leveraging their complementary learning behaviors: the pretrained model provides stable predictions, whereas the CNN adapts throughout training. This asymmetry, where the vision Transformer is trained only on clean samples and the CNN is trained through SSL, effectively mitigates confirmation bias. Second, selective unlearning enables post-hoc error correction by identifying incorrectly memorized samples through loss trajectory analysis and CLIP consistency checks, and then removing their influence via Kullback--Leibler divergence-based forgetting. This approach shifts the learning paradigm from passive error avoidance to active error correction. Experiments on synthetic and real-world noisy datasets, including CIFAR-10/100, CIFAR-N, WebVision, Clothing1M, and Red Mini-ImageNet, demonstrate state-of-the-art performance, particularly in high-noise regimes and under instance-dependent noise. The code is publicly available at https://github.com/meruemon/ACD-U.
Paper Structure (38 sections, 15 equations, 6 figures, 13 tables, 6 algorithms)

This paper contains 38 sections, 15 equations, 6 figures, 13 tables, 6 algorithms.

Figures (6)

  • Figure 1: Overview of the ACD-U framework, comprising three main components: (1) unlearning sample selection, (2) sample forgetting via unlearning, and (3) asymmetric co-teaching between different model architectures. Unlearning sample selection is performed every $E_{UP}$ epochs. Subsequently, the sample forgetting process is executed for $E_{UD}$ epochs. The training dataset, $D_t$, used for ACD is constructed by excluding samples from the unlearning target datasets, $D^{(A)}_u$ and $D^{(V)}_u$.
  • Figure 2: Architecture of asymmetric Co-teaching with different architectures (ACD) framework, which employs an asymmetric training strategy where the pretrained Vision Transformer (net $V$) trains only on labeled samples, whereas the CNN (net $A$) utilizes both labeled and unlabeled samples in a semi-supervised learning pipeline.
  • Figure 3: Accuracy curves on CIFAR-100 with (a) 50% and (b) 80% symmetric noise, showing the effects of the ACD and unlearning components, respectively.
  • Figure 4: Comparison of how noisy samples misidentified as clean (HN) are treated over time on CIFAR-100 (Sym. 80%). Left: In DivideMix, initial errors persist. Right: In ACD-U, initial errors are corrected.
  • Figure 5: Sensitivity analysis for the unlearning batch size and forgetting intensity parameter $T_{unl}$. Panels (a) and (b) show the results for the CIFAR datasets, whereas (c) and (d) show those for Clothing1M. For the CIFAR-10/-100 datasets, the “Best” accuracy is shown, whereas for CIFAR-100N and Clothing1M, the reported accuracies are based on the same evaluation criteria as those detailed in Tables \ref{['tb:CIFAR-N']} and \ref{['tb:Clothing1M']}.
  • ...and 1 more figures