Table of Contents
Fetching ...

Combating Noisy Labels via Dynamic Connection Masking

Xinlei Zhang, Fan Liu, Chuanyi Zhang, Fan Cheng, Yuhui Zheng

TL;DR

Noisy labels cause gradient contamination and overfitting in deep networks. Dynamic Connection Masking (DCM) adaptively prunes less informative edges in FC and Kolmogorov-Arnold Networks (KAN) to suppress noisy-gradient propagation while preserving learning signal, and it integrates with existing robust losses and sample-selection strategies. Empirical results on CIFAR-10/100, WebVision-Mini, and Clothing1M show consistent performance gains, with KAN-based classifiers often achieving superior noise robustness in real-world scenarios. The approach is demonstrated as a versatile, plug-and-play regularization that enhances robustness across synthetic and real-world benchmarks.

Abstract

Noisy labels are inevitable in real-world scenarios. Due to the strong capacity of deep neural networks to memorize corrupted labels, these noisy labels can cause significant performance degradation. Existing research on mitigating the negative effects of noisy labels has mainly focused on robust loss functions and sample selection, with comparatively limited exploration of regularization in model architecture. Inspired by the sparsity regularization used in Kolmogorov-Arnold Networks (KANs), we propose a Dynamic Connection Masking (DCM) mechanism for both Multi-Layer Perceptron Networks (MLPs) and KANs to enhance the robustness of classifiers against noisy labels. The mechanism can adaptively mask less important edges during training by evaluating their information-carrying capacity. Through theoretical analysis, we demonstrate its efficiency in reducing gradient error. Our approach can be seamlessly integrated into various noise-robust training methods to build more robust deep networks, including robust loss functions, sample selection strategies, and regularization techniques. Extensive experiments on both synthetic and real-world benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches. Furthermore, we are also the first to investigate KANs as classifiers against noisy labels, revealing their superior noise robustness over MLPs in real-world noisy scenarios. Our code will soon be publicly available.

Combating Noisy Labels via Dynamic Connection Masking

TL;DR

Noisy labels cause gradient contamination and overfitting in deep networks. Dynamic Connection Masking (DCM) adaptively prunes less informative edges in FC and Kolmogorov-Arnold Networks (KAN) to suppress noisy-gradient propagation while preserving learning signal, and it integrates with existing robust losses and sample-selection strategies. Empirical results on CIFAR-10/100, WebVision-Mini, and Clothing1M show consistent performance gains, with KAN-based classifiers often achieving superior noise robustness in real-world scenarios. The approach is demonstrated as a versatile, plug-and-play regularization that enhances robustness across synthetic and real-world benchmarks.

Abstract

Noisy labels are inevitable in real-world scenarios. Due to the strong capacity of deep neural networks to memorize corrupted labels, these noisy labels can cause significant performance degradation. Existing research on mitigating the negative effects of noisy labels has mainly focused on robust loss functions and sample selection, with comparatively limited exploration of regularization in model architecture. Inspired by the sparsity regularization used in Kolmogorov-Arnold Networks (KANs), we propose a Dynamic Connection Masking (DCM) mechanism for both Multi-Layer Perceptron Networks (MLPs) and KANs to enhance the robustness of classifiers against noisy labels. The mechanism can adaptively mask less important edges during training by evaluating their information-carrying capacity. Through theoretical analysis, we demonstrate its efficiency in reducing gradient error. Our approach can be seamlessly integrated into various noise-robust training methods to build more robust deep networks, including robust loss functions, sample selection strategies, and regularization techniques. Extensive experiments on both synthetic and real-world benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches. Furthermore, we are also the first to investigate KANs as classifiers against noisy labels, revealing their superior noise robustness over MLPs in real-world noisy scenarios. Our code will soon be publicly available.

Paper Structure

This paper contains 19 sections, 5 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Comparison of various methods for learning with noisy labels. Robust loss functions achieve noise-tolerant loss for optimization. Sample selection strategies aim to identify clean data $\tilde{y}_s$ from noisy samples $\tilde{y}$. Popular regularization methods, such as Dropout or DropConnect, randomly remove neurons or connections to mitigate overfitting. Our DCM selectively adjusts the classifier connections, allowing only important pathways for gradient backpropagation.
  • Figure 2: Overview of our dynamic connection masking mechanism. (i) We first compute the edge activation value $\boldsymbol{A} \in \mathbb{R}^{B \times C \times d}$ via multiplication between the input feature $v_{ik}$ and its corresponding edge weight $w_{jk}$, where $B$, $C$ and $d$ denote batch size, total class number, and the dimension of the input feature. Then, the edge importance score $\textbf{S}$ is obtained by measuring the standard deviation of $\boldsymbol{A}$ along the batch dimension (Eq. \ref{['eq2']}). (ii) We adaptively mask edges with lower importance scores during training, dynamically adjusting the masking of connections at each timestep interval $t$.
  • Figure 3: Comparison of gradient error $\varepsilon_{f}$ across different models under various noise levels on the CIFAR-10 dataset. Specifically, $f_{\text{CE-DFC}}$ and $f_{\text{ANL-DFC}}$ denote the classifier with our dynamic connection masking combined with CE and ANL, respectively. The $f_{\text{CE}}$ and $f_{\text{ANL}}$ represent the original CE and ANL methods with an FC classifier. Figures (a) to (e) illustrate the average cosine similarity between clean and noisy gradients of the last layer backbone parameters over epochs.
  • Figure 4: Noisy and clean confidence analysis across different classifiers on CIFAR-10. To facilitate a clear comparison, results from mid-training to final epochs are presented.
  • Figure 5: Visualization of DFC and DKAN. During the training phase, the edge connection patterns dynamically adapt to optimize information flow, thereby facilitating the propagation of critical features while simultaneously improving the network's resilience to noise.
  • ...and 1 more figures