Table of Contents
Fetching ...

CoLafier: Collaborative Noisy Label Purifier With Local Intrinsic Dimensionality Guidance

Dongyu Zhang, Ruofan Hu, Elke Rundensteiner

TL;DR

CoLafier tackles the challenge of learning with noisy labels by introducing Local Intrinsic Dimensionality (LID) as a discriminative signal for label reliability. It builds a collaborative two-subnet system, LID-dis and LID-gen, and trains with dual augmented views to produce LID-based instance weights and pseudo-labels, while incorporating CutMix and a cosine-consistency term to robustify learning under noise. The method updates labels only when both views agree and LID-derived criteria are met, then deploys LID-gen as the final classifier. Experiments on CIFAR-10 variants and CIFAR-10N demonstrate strong robustness to various noise patterns, outperforming several state-of-the-art LNL methods, with ablations confirming the value of the dual-view design and the three loss components. Overall, CoLafier provides a practical, LID-guided approach to reliable learning from noisy annotations, without requiring prior noise information.

Abstract

Deep neural networks (DNNs) have advanced many machine learning tasks, but their performance is often harmed by noisy labels in real-world data. Addressing this, we introduce CoLafier, a novel approach that uses Local Intrinsic Dimensionality (LID) for learning with noisy labels. CoLafier consists of two subnets: LID-dis and LID-gen. LID-dis is a specialized classifier. Trained with our uniquely crafted scheme, LID-dis consumes both a sample's features and its label to predict the label - which allows it to produce an enhanced internal representation. We observe that LID scores computed from this representation effectively distinguish between correct and incorrect labels across various noise scenarios. In contrast to LID-dis, LID-gen, functioning as a regular classifier, operates solely on the sample's features. During training, CoLafier utilizes two augmented views per instance to feed both subnets. CoLafier considers the LID scores from the two views as produced by LID-dis to assign weights in an adapted loss function for both subnets. Concurrently, LID-gen, serving as classifier, suggests pseudo-labels. LID-dis then processes these pseudo-labels along with two views to derive LID scores. Finally, these LID scores along with the differences in predictions from the two subnets guide the label update decisions. This dual-view and dual-subnet approach enhances the overall reliability of the framework. Upon completion of the training, we deploy the LID-gen subnet of CoLafier as the final classification model. CoLafier demonstrates improved prediction accuracy, surpassing existing methods, particularly under severe label noise. For more details, see the code at https://github.com/zdy93/CoLafier.

CoLafier: Collaborative Noisy Label Purifier With Local Intrinsic Dimensionality Guidance

TL;DR

CoLafier tackles the challenge of learning with noisy labels by introducing Local Intrinsic Dimensionality (LID) as a discriminative signal for label reliability. It builds a collaborative two-subnet system, LID-dis and LID-gen, and trains with dual augmented views to produce LID-based instance weights and pseudo-labels, while incorporating CutMix and a cosine-consistency term to robustify learning under noise. The method updates labels only when both views agree and LID-derived criteria are met, then deploys LID-gen as the final classifier. Experiments on CIFAR-10 variants and CIFAR-10N demonstrate strong robustness to various noise patterns, outperforming several state-of-the-art LNL methods, with ablations confirming the value of the dual-view design and the three loss components. Overall, CoLafier provides a practical, LID-guided approach to reliable learning from noisy annotations, without requiring prior noise information.

Abstract

Deep neural networks (DNNs) have advanced many machine learning tasks, but their performance is often harmed by noisy labels in real-world data. Addressing this, we introduce CoLafier, a novel approach that uses Local Intrinsic Dimensionality (LID) for learning with noisy labels. CoLafier consists of two subnets: LID-dis and LID-gen. LID-dis is a specialized classifier. Trained with our uniquely crafted scheme, LID-dis consumes both a sample's features and its label to predict the label - which allows it to produce an enhanced internal representation. We observe that LID scores computed from this representation effectively distinguish between correct and incorrect labels across various noise scenarios. In contrast to LID-dis, LID-gen, functioning as a regular classifier, operates solely on the sample's features. During training, CoLafier utilizes two augmented views per instance to feed both subnets. CoLafier considers the LID scores from the two views as produced by LID-dis to assign weights in an adapted loss function for both subnets. Concurrently, LID-gen, serving as classifier, suggests pseudo-labels. LID-dis then processes these pseudo-labels along with two views to derive LID scores. Finally, these LID scores along with the differences in predictions from the two subnets guide the label update decisions. This dual-view and dual-subnet approach enhances the overall reliability of the framework. Upon completion of the training, we deploy the LID-gen subnet of CoLafier as the final classification model. CoLafier demonstrates improved prediction accuracy, surpassing existing methods, particularly under severe label noise. For more details, see the code at https://github.com/zdy93/CoLafier.
Paper Structure (21 sections, 23 equations, 2 figures, 3 tables)

This paper contains 21 sections, 23 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Distribution of LID scores for true-labeled versus false-labeled instances in three noise conditions. The heights of the orange and blue bars represent the proportions of true-labeled and false-labeled instances' LID scores within specific bins, respectively. LID scores are based on the enhanced representation of features and labels in the LID-dis. From top to bottom, the noise conditions for the three figures are: 20% instance-dependent noise, 40% instance-dependent noise, and 50% symmetric noise.
  • Figure 2: The overall framework of CoLafier.