Dual-granularity Sinkhorn Distillation for Enhanced Learning from Long-tailed Noisy Data
Feng Hong, Yu Huang, Zihua Zhao, Zhihan Zhou, Jiangchao Yao, Dongsheng Li, Ya Zhang, Yanfeng Wang
TL;DR
This paper addresses learning from data with co-occurring class imbalance and label noise. It introduces Dual-granularity Sinkhorn Distillation (D-SINK), which distills knowledge from two auxiliary models: fL trained for imbalance robustness and fN trained for noise robustness, through a trainable surrogate label matrix Q. The supervision integrates sample-level alignment to fN and distribution-level alignment to fL via the objective $\mathcal{L}_{\mathrm{D-SINK}}$ and a bi-level optimization where $\mathcal{L}_{\mathrm{Overall}} = \mathcal{L}_{\mathrm{Base}} + \alpha \mathcal{L}_{\mathrm{D-SINK}}$, with $Q$ optimized through entropic OT solving for costs $P_i = -\log f_N(x_i) - \log f(x_i)$ and constraints $Q 1_N = \sum_i f_L(x_i)$, $Q^T 1_C = 1_N$. The outer loop uses the Sinkhorn-Knopp algorithm to compute $Q = N \operatorname{diag}(u) M \operatorname{diag}(v)$ with $M = e^{-P/2}$, enabling scalable training. Experiments on CIFAR-10/100 with synthetic and real-world noisy labels show that D-SINK outperforms strong baselines and serves as a universal framework for combining single-robustness models.
Abstract
Real-world datasets for deep learning frequently suffer from the co-occurring challenges of class imbalance and label noise, hindering model performance. While methods exist for each issue, effectively combining them is non-trivial, as distinguishing genuine tail samples from noisy data proves difficult, often leading to conflicting optimization strategies. This paper presents a novel perspective: instead of primarily developing new complex techniques from scratch, we explore synergistically leveraging well-established, individually 'weak' auxiliary models - specialized for tackling either class imbalance or label noise but not both. This view is motivated by the insight that class imbalance (a distributional-level concern) and label noise (a sample-level concern) operate at different granularities, suggesting that robustness mechanisms for each can in principle offer complementary strengths without conflict. We propose Dual-granularity Sinkhorn Distillation (D-SINK), a novel framework that enhances dual robustness by distilling and integrating complementary insights from such 'weak', single-purpose auxiliary models. Specifically, D-SINK uses an optimal transport-optimized surrogate label allocation to align the target model's sample-level predictions with a noise-robust auxiliary and its class distributions with an imbalance-robust one. Extensive experiments on benchmark datasets demonstrate that D-SINK significantly improves robustness and achieves strong empirical performance in learning from long-tailed noisy data.
