Variation-Bounded Loss for Noise-Tolerant Learning
Jialiang Wang, Xiong Zhou, Xianming Liu, Gangfeng Hu, Deming Zhai, Junjun Jiang, Haoliang Li
TL;DR
The paper tackles robustness to noisy labels in supervised learning by introducing the Variation Ratio $v(L)$ as a fundamental property of loss functions and proposing Variation-Bounded Loss (VBL) with finite $v(L)$. The authors develop theoretical results showing that smaller $v(L)$ yields tighter excess-risk bounds under symmetric and certain asymmetric noises, and they establish a practical path from the variation ratio to asymmetric conditions. They formalize and analyze how $v(L)$ relaxes the symmetric condition and enables asymmetry, and they present three concrete variation-bounded losses (VCE, VEL, VSL) with tunable parameters. Empirically, VBL variants, including combinations with Normalized Cross Entropy (NCE), achieve strong performance across CIFAR benchmarks and real-world noisy datasets such as WebVision, ILSVRC12, and Clothing1M, while also providing improved feature representations under label noise. The work offers a compact, effective framework for designing robust losses with broad applicability to noisy-label scenarios.
Abstract
Mitigating the negative impact of noisy labels has been aperennial issue in supervised learning. Robust loss functions have emerged as a prevalent solution to this problem. In this work, we introduce the Variation Ratio as a novel property related to the robustness of loss functions, and propose a new family of robust loss functions, termed Variation-Bounded Loss (VBL), which is characterized by a bounded variation ratio. We provide theoretical analyses of the variation ratio, proving that a smaller variation ratio would lead to better robustness. Furthermore, we reveal that the variation ratio provides a feasible method to relax the symmetric condition and offers a more concise path to achieve the asymmetric condition. Based on the variation ratio, we reformulate several commonly used loss functions into a variation-bounded form for practical applications. Positive experiments on various datasets exhibit the effectiveness and flexibility of our approach.
