Table of Contents
Fetching ...

LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration

Siqi Wang, Bryan A. Plummer

TL;DR

This work tackles learning with noisy labels by introducing noise-source knowledge (NS knowledge) into LNL (LNL+K), leveraging the observation that label noise often originates from a limited set of confusable categories and can be quantified via $p(c|x_i)$ and $p(c_n|x_i)$. It defines a unified clean-sample-detection framework and adapts state-of-the-art LNL methods (CRUST, FINE, SFT, UNICON, DISC) to incorporate NS knowledge, including DualT-based NS estimation. Across six datasets and two noise regimes (dominant and asymmetric), LNL+K yields substantial gains, with up to 23% accuracy improvements in dominant-noise settings and robust improvements under incomplete or estimated NS knowledge. The study introduces the notion of knowledge absorption rate and demonstrates that direct LNL+K investigation is valuable for achieving reliable learning under real-world noisy labeling scenarios, particularly when NS information is partial or noisy.

Abstract

Learning with noisy labels (LNL) aims to train a high-performing model using a noisy dataset. We observe that noise for a given class often comes from a limited set of categories, yet many LNL methods overlook this. For example, an image mislabeled as a cheetah is more likely a leopard than a hippopotamus due to its visual similarity. Thus, we explore Learning with Noisy Labels with noise source Knowledge integration (LNL+K), which leverages knowledge about likely source(s) of label noise that is often provided in a dataset's meta-data. Integrating noise source knowledge boosts performance even in settings where LNL methods typically fail. For example, LNL+K methods are effective on datasets where noise represents the majority of samples, which breaks a critical premise of most methods developed for LNL. Our LNL+K methods can boost performance even when noise sources are estimated rather than extracted from meta-data. We provide several baseline LNL+K methods that integrate noise source knowledge into state-of-the-art LNL models that are evaluated across six diverse datasets and two types of noise, where we report gains of up to 23% compared to the unadapted methods. Critically, we show that LNL methods fail to generalize on some real-world datasets, even when adapted to integrate noise source knowledge, highlighting the importance of directly exploring LNL+K.

LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration

TL;DR

This work tackles learning with noisy labels by introducing noise-source knowledge (NS knowledge) into LNL (LNL+K), leveraging the observation that label noise often originates from a limited set of confusable categories and can be quantified via and . It defines a unified clean-sample-detection framework and adapts state-of-the-art LNL methods (CRUST, FINE, SFT, UNICON, DISC) to incorporate NS knowledge, including DualT-based NS estimation. Across six datasets and two noise regimes (dominant and asymmetric), LNL+K yields substantial gains, with up to 23% accuracy improvements in dominant-noise settings and robust improvements under incomplete or estimated NS knowledge. The study introduces the notion of knowledge absorption rate and demonstrates that direct LNL+K investigation is valuable for achieving reliable learning under real-world noisy labeling scenarios, particularly when NS information is partial or noisy.

Abstract

Learning with noisy labels (LNL) aims to train a high-performing model using a noisy dataset. We observe that noise for a given class often comes from a limited set of categories, yet many LNL methods overlook this. For example, an image mislabeled as a cheetah is more likely a leopard than a hippopotamus due to its visual similarity. Thus, we explore Learning with Noisy Labels with noise source Knowledge integration (LNL+K), which leverages knowledge about likely source(s) of label noise that is often provided in a dataset's meta-data. Integrating noise source knowledge boosts performance even in settings where LNL methods typically fail. For example, LNL+K methods are effective on datasets where noise represents the majority of samples, which breaks a critical premise of most methods developed for LNL. Our LNL+K methods can boost performance even when noise sources are estimated rather than extracted from meta-data. We provide several baseline LNL+K methods that integrate noise source knowledge into state-of-the-art LNL models that are evaluated across six diverse datasets and two types of noise, where we report gains of up to 23% compared to the unadapted methods. Critically, we show that LNL methods fail to generalize on some real-world datasets, even when adapted to integrate noise source knowledge, highlighting the importance of directly exploring LNL+K.
Paper Structure (25 sections, 22 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 22 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: (Best view in color.) Comparison of LNL and LNL+K on a hard-negative clean sample. (a) Traditional LNL methods (e.g., kim2021finecrustwei2022selfkarim2022unicon) classify an input image as having a noisy label based on a similarity threshold between the sample and its (majority) class features. (b) In contrast, LNL+K methods identify the sample as a clean label by considering the noise source dog. Specifically, since probability of cat is higher than that of dog and aligns more closely with the cat in the feature space, LNL+K judges it as more likely a cat image.
  • Figure 2: Comparisons of noise types. Asymmetric noise can occur bidirectionally with limited noise ratios, while Dominant noise can exceed 50% in recessive classes, with clean dominant classes.
  • Figure 3: Class prediction confusion matrix for weak treatment cell images in the CHAMMI-CP CHAMMI dataset, normalized to sum to 100%. The integration of knowledge (+K) enhances the method's capability to distinguish weak treatment from a high-ratio control class. See Section \ref{['subsec:dominant']} for more details.
  • Figure 4: Comparisions of knowledge-adaptive methods with different degrees of noisy noise sources. (MK: missing knowledge, NK: noisy knowledge and M&N: the combination of these two.) Note: complete knowledge has 50 noise sources. See Section \ref{['sec:discussion']} for discussion.