Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation
Hao Li, Jiayang Gu, Jingkuan Song, An Zhang, Lianli Gao
TL;DR
This work tackles the challenge of noisy labels in large-scale pre-training by introducing One-Step Anti-noise (OSA), a model-agnostic, estimator-driven paradigm that uses a high-dimensional cosine boundary in cone space to separate clean from noisy samples. The authors establish a boundary principle where the observed intersection boundary corresponds to a shifted orthogonal boundary, supported by theoretical analysis of propagation and contrastive learning. OSA employs a zero-shot estimator (e.g., CLIP) to score sample cleanliness in one forward pass, debiases the boundary with spatial debiasing, and reweights losses to mitigate noise with minimal computational overhead. Across multimodal and single-modality tasks, OSA achieves state-of-the-art robustness and transferability while maintaining efficiency, demonstrating strong practical potential for large-scale, noise-prone pre-training scenarios. The approach is architecture- and estimator-agnostic, scalable, and comes with code for broader adoption and experimentation in diverse domains.
Abstract
Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computational overhead. In this work, we exploit the properties of high-dimensional orthogonality to identify a robust and effective boundary in cone space for separating clean and noisy samples. Building on this, we propose One-Step Anti-noise (OSA), a model-agnostic noisy label mitigation paradigm that employs an estimator model and a scoring function to assess the noise level of input pairs through just one-step inference. We empirically validate the superiority of OSA, demonstrating its enhanced training robustness, improved task transferability, streamlined deployment, and reduced computational overhead across diverse benchmarks, models, and tasks. Our code is released at https://github.com/leolee99/OSA.
