Table of Contents
Fetching ...

Jump-teaching: Combating Sample Selection Bias via Temporal Disagreement

Kangye Ji, Fei Cheng, Zeqing Wang, Qichang Zhang, Bohu Huang

TL;DR

Jump-teaching tackles compounding sample-selection bias in noisy-label learning by introducing a Jump-update Strategy that leverages temporal disagreement within a single network to debias updates, eliminating the need for dual networks or multi-round retraining. It also introduces a Single-loss Criterion that decomposes per-sample loss into a semantic sub-loss distribution via a $K$-bit Hadamard codebook and an auxiliary head, enabling precise, sample-wise selection. The method couples a two-head training pipeline with temperature scaling and a classifier-based recovery mechanism to stabilize selection and updates. Extensive experiments on CIFAR-10/100 with various noise types and real-world datasets show state-of-the-art robustness and dramatic efficiency gains (up to $4.47\times$ speed and $54\%$ memory reduction). This work provides a scalable, practical approach to robust learning under severe label noise.

Abstract

Sample selection is a straightforward technique to combat noisy labels, aiming to prevent mislabeled samples from degrading the robustness of neural networks. However, existing methods mitigate compounding selection bias either by leveraging dual-network disagreement or additional forward propagations, leading to multiplied training overhead. To address this challenge, we introduce $\textit{Jump-teaching}$, an efficient sample selection framework for debiased model update and simplified selection criterion. Based on a key observation that a neural network exhibits significant disagreement across different training iterations, Jump-teaching proposes a jump-manner model update strategy to enable self-correction of selection bias by harnessing temporal disagreement, eliminating the need for multi-network or multi-round training. Furthermore, we employ a sample-wise selection criterion building on the intra variance of a decomposed single loss for a fine-grained selection without relying on batch-wise ranking or dataset-wise modeling. Extensive experiments demonstrate that Jump-teaching outperforms state-of-the-art counterparts while achieving a nearly overhead-free selection procedure, which boosts training speed by up to $4.47\times$ and reduces peak memory footprint by $54\%$.

Jump-teaching: Combating Sample Selection Bias via Temporal Disagreement

TL;DR

Jump-teaching tackles compounding sample-selection bias in noisy-label learning by introducing a Jump-update Strategy that leverages temporal disagreement within a single network to debias updates, eliminating the need for dual networks or multi-round retraining. It also introduces a Single-loss Criterion that decomposes per-sample loss into a semantic sub-loss distribution via a -bit Hadamard codebook and an auxiliary head, enabling precise, sample-wise selection. The method couples a two-head training pipeline with temperature scaling and a classifier-based recovery mechanism to stabilize selection and updates. Extensive experiments on CIFAR-10/100 with various noise types and real-world datasets show state-of-the-art robustness and dramatic efficiency gains (up to speed and memory reduction). This work provides a scalable, practical approach to robust learning under severe label noise.

Abstract

Sample selection is a straightforward technique to combat noisy labels, aiming to prevent mislabeled samples from degrading the robustness of neural networks. However, existing methods mitigate compounding selection bias either by leveraging dual-network disagreement or additional forward propagations, leading to multiplied training overhead. To address this challenge, we introduce , an efficient sample selection framework for debiased model update and simplified selection criterion. Based on a key observation that a neural network exhibits significant disagreement across different training iterations, Jump-teaching proposes a jump-manner model update strategy to enable self-correction of selection bias by harnessing temporal disagreement, eliminating the need for multi-network or multi-round training. Furthermore, we employ a sample-wise selection criterion building on the intra variance of a decomposed single loss for a fine-grained selection without relying on batch-wise ranking or dataset-wise modeling. Extensive experiments demonstrate that Jump-teaching outperforms state-of-the-art counterparts while achieving a nearly overhead-free selection procedure, which boosts training speed by up to and reduces peak memory footprint by .
Paper Structure (11 sections, 10 equations, 4 figures, 5 tables)

This paper contains 11 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1:
  • Figure 2:
  • Figure 4: Error flows in Property 1-based and Property 2-based methods. The jump-update strategy reduces error accumulation without additional propagations or networks.
  • Figure 5: Test accuracies on CIFAR-10 with Sym. $\epsilon=0.8$.