ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

Reo Fukunaga; Soh Yoshida; Mitsuji Muneyasu

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

Reo Fukunaga, Soh Yoshida, Mitsuji Muneyasu

TL;DR

ACD-U proposes asymmetric co-teaching with different architectures (ACD)-U, an asymmetric co-teaching framework that uses different model architectures and incorporates machine unlearning that shifts the learning paradigm from passive error avoidance to active error correction.

Abstract

Deep neural networks are prone to memorizing incorrect labels during training, which degrades their generalizability. Although recent methods have combined sample selection with semi-supervised learning (SSL) to exploit the memorization effect -- where networks learn from clean data before noisy data -- they cannot correct selection errors once a sample is misclassified. To overcome this, we propose asymmetric co-teaching with different architectures (ACD)-U, an asymmetric co-teaching framework that uses different model architectures and incorporates machine unlearning. ACD-U addresses this limitation through two core mechanisms. First, its asymmetric co-teaching pairs a contrastive language-image pretraining (CLIP)-pretrained vision Transformer with a convolutional neural network (CNN), leveraging their complementary learning behaviors: the pretrained model provides stable predictions, whereas the CNN adapts throughout training. This asymmetry, where the vision Transformer is trained only on clean samples and the CNN is trained through SSL, effectively mitigates confirmation bias. Second, selective unlearning enables post-hoc error correction by identifying incorrectly memorized samples through loss trajectory analysis and CLIP consistency checks, and then removing their influence via Kullback--Leibler divergence-based forgetting. This approach shifts the learning paradigm from passive error avoidance to active error correction. Experiments on synthetic and real-world noisy datasets, including CIFAR-10/100, CIFAR-N, WebVision, Clothing1M, and Red Mini-ImageNet, demonstrate state-of-the-art performance, particularly in high-noise regimes and under instance-dependent noise. The code is publicly available at https://github.com/meruemon/ACD-U.

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

TL;DR

Abstract

Paper Structure (38 sections, 15 equations, 6 figures, 13 tables, 6 algorithms)

This paper contains 38 sections, 15 equations, 6 figures, 13 tables, 6 algorithms.

Introduction
Related work
Machine unlearning
Learning with noisy labels
Proposed method
Unlearning target sample selection
Condition 1: Low-loss samples
Condition 2: Loss-drop samples
Condition 3: CLIP-consistent
Finalizing the unlearning sets
Forgetting noisy samples
Asymmetric co-teaching
Phase 1: Sample selection
Phase 2: Pseudo-labeling
Phase 3: Mixup augmentation
...and 23 more sections

Figures (6)

Figure 1: Overview of the ACD-U framework, comprising three main components: (1) unlearning sample selection, (2) sample forgetting via unlearning, and (3) asymmetric co-teaching between different model architectures. Unlearning sample selection is performed every $E_{UP}$ epochs. Subsequently, the sample forgetting process is executed for $E_{UD}$ epochs. The training dataset, $D_t$, used for ACD is constructed by excluding samples from the unlearning target datasets, $D^{(A)}_u$ and $D^{(V)}_u$.
Figure 2: Architecture of asymmetric Co-teaching with different architectures (ACD) framework, which employs an asymmetric training strategy where the pretrained Vision Transformer (net $V$) trains only on labeled samples, whereas the CNN (net $A$) utilizes both labeled and unlabeled samples in a semi-supervised learning pipeline.
Figure 3: Accuracy curves on CIFAR-100 with (a) 50% and (b) 80% symmetric noise, showing the effects of the ACD and unlearning components, respectively.
Figure 4: Comparison of how noisy samples misidentified as clean (HN) are treated over time on CIFAR-100 (Sym. 80%). Left: In DivideMix, initial errors persist. Right: In ACD-U, initial errors are corrected.
Figure 5: Sensitivity analysis for the unlearning batch size and forgetting intensity parameter $T_{unl}$. Panels (a) and (b) show the results for the CIFAR datasets, whereas (c) and (d) show those for Clothing1M. For the CIFAR-10/-100 datasets, the “Best” accuracy is shown, whereas for CIFAR-100N and Clothing1M, the reported accuracies are based on the same evaluation criteria as those detailed in Tables \ref{['tb:CIFAR-N']} and \ref{['tb:Clothing1M']}.
...and 1 more figures

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

TL;DR

Abstract

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

Authors

TL;DR

Abstract

Table of Contents

Figures (6)