Optimized Gradient Clipping for Noisy Label Learning

Xichen Ye; Yifan Wu; Weizhong Zhang; Xiaoqiang Li; Yifan Chen; Cheng Jin

Optimized Gradient Clipping for Noisy Label Learning

Xichen Ye, Yifan Wu, Weizhong Zhang, Xiaoqiang Li, Yifan Chen, Cheng Jin

TL;DR

Optimized Gradient Clipping (OGC) tackles the instability of learning with noisy labels by dynamically adjusting the gradient clipping threshold $\tau^{(t)}$ at each training step. It does so by modeling clean and noisy cross-entropy losses with a 2-Gaussian Mixture Model and constraining the noise-to-clean gradient ratio after clipping, with a formal optimization and a mapping to transformed losses like $\bar{\ell}_{CE}$. The approach yields theoretical noise-tolerance guarantees and demonstrates superior performance across symmetric, asymmetric, instance-dependent, and real-world label noise, including substantial gains when combined with robust losses. The method remains efficient, adding modest overhead while enabling strong robustness on large-scale noisy datasets such as WebVision.

Abstract

Previous research has shown that constraining the gradient of loss function with respect to model-predicted probabilities can enhance the model robustness against noisy labels. These methods typically specify a fixed optimal threshold for gradient clipping through validation data to obtain the desired robustness against noise. However, this common practice overlooks the dynamic distribution of gradients from both clean and noisy-labeled samples at different stages of training, significantly limiting the model capability to adapt to the variable nature of gradients throughout the training process. To address this issue, we propose a simple yet effective approach called Optimized Gradient Clipping (OGC), which dynamically adjusts the clipping threshold based on the ratio of noise gradients to clean gradients after clipping, estimated by modeling the distributions of clean and noisy samples. This approach allows us to modify the clipping threshold at each training step, effectively controlling the influence of noise gradients. Additionally, we provide statistical analysis to certify the noise-tolerance ability of OGC. Our extensive experiments across various types of label noise, including symmetric, asymmetric, instance-dependent, and real-world noise, demonstrate the effectiveness of our approach.

Optimized Gradient Clipping for Noisy Label Learning

TL;DR

Abstract

Optimized Gradient Clipping for Noisy Label Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)