Jump-teaching: Combating Sample Selection Bias via Temporal Disagreement
Kangye Ji, Fei Cheng, Zeqing Wang, Qichang Zhang, Bohu Huang
TL;DR
Jump-teaching tackles compounding sample-selection bias in noisy-label learning by introducing a Jump-update Strategy that leverages temporal disagreement within a single network to debias updates, eliminating the need for dual networks or multi-round retraining. It also introduces a Single-loss Criterion that decomposes per-sample loss into a semantic sub-loss distribution via a $K$-bit Hadamard codebook and an auxiliary head, enabling precise, sample-wise selection. The method couples a two-head training pipeline with temperature scaling and a classifier-based recovery mechanism to stabilize selection and updates. Extensive experiments on CIFAR-10/100 with various noise types and real-world datasets show state-of-the-art robustness and dramatic efficiency gains (up to $4.47\times$ speed and $54\%$ memory reduction). This work provides a scalable, practical approach to robust learning under severe label noise.
Abstract
Sample selection is a straightforward technique to combat noisy labels, aiming to prevent mislabeled samples from degrading the robustness of neural networks. However, existing methods mitigate compounding selection bias either by leveraging dual-network disagreement or additional forward propagations, leading to multiplied training overhead. To address this challenge, we introduce $\textit{Jump-teaching}$, an efficient sample selection framework for debiased model update and simplified selection criterion. Based on a key observation that a neural network exhibits significant disagreement across different training iterations, Jump-teaching proposes a jump-manner model update strategy to enable self-correction of selection bias by harnessing temporal disagreement, eliminating the need for multi-network or multi-round training. Furthermore, we employ a sample-wise selection criterion building on the intra variance of a decomposed single loss for a fine-grained selection without relying on batch-wise ranking or dataset-wise modeling. Extensive experiments demonstrate that Jump-teaching outperforms state-of-the-art counterparts while achieving a nearly overhead-free selection procedure, which boosts training speed by up to $4.47\times$ and reduces peak memory footprint by $54\%$.
