Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Zehui Liao; Shishuai Hu; Yanning Zhang; Yong Xia

Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Zehui Liao, Shishuai Hu, Yanning Zhang, Yong Xia

TL;DR

The ENCOFA framework introduces the Extended Noise-robust Supervised Contrastive Loss, which enhances feature discrimination across both in-distribution and out-of-distribution classes, and develops the Open-set Feature Augmentation module, which enriches open-set samples at the feature level and dynamically assigns class labels, thereby leveraging model capacity while mitigating overfitting to noisy data.

Abstract

Addressing mixed closed-set and open-set label noise in medical image classification remains a largely unexplored challenge. Unlike natural image classification, which often separates and processes closed-set and open-set noisy samples from clean ones, medical image classification contends with high inter-class similarity, complicating the identification of open-set noisy samples. Additionally, existing methods often fail to fully utilize open-set noisy samples for label noise mitigation, leading to their exclusion or the application of uniform soft labels. To address these challenges, we propose the Extended Noise-robust Contrastive and Open-set Feature Augmentation framework for medical image classification tasks. This framework incorporates the Extended Noise-robust Supervised Contrastive Loss, which helps differentiate features among both in-distribution and out-of-distribution classes. This loss treats open-set noisy samples as an extended class, improving label noise mitigation by weighting contrastive pairs according to label reliability. Additionally, we develop the Open-set Feature Augmentation module that enriches open-set samples at the feature level and then assigns them dynamic class labels, thereby leveraging the model's capacity and reducing overfitting to noisy data. We evaluated the proposed framework on both a synthetic noisy dataset and a real-world noisy dataset. The results indicate the superiority of our framework over four existing methods and the effectiveness of leveraging open-set noisy samples to combat label noise.

Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

TL;DR

Abstract

Paper Structure (22 sections, 8 equations, 6 figures, 7 tables)

This paper contains 22 sections, 8 equations, 6 figures, 7 tables.

Introduction
Related Work
Learning with Open-set and Closed-set Label Noise
Label Noise-robust Medical Image Classification
Method
Problem Formalization and Method Overview
Classification Backbone
Noise Type Identifier
Classification Loss
Extended Noise-robust Supervised Contrastive Loss
OSFeatAug Module
Loss Function
Datasets
Experiments and Results
Implementation Details
...and 7 more sections

Figures (6)

Figure 1: Types of data samples illustrating: (a) Clean data, where samples are correctly labeled, (b) Closed-set noisy data, where in-distribution (ID) samples are mislabeled as other known classes, and (c) Open-set noisy data, where out-of-distribution (OOD) samples are mislabeled as any of the known classes. These samples are sourced from the Kather5k dataset, which comprises eight classes: TUM, s-STR, c-STR, LYM, Norm, DEB, ADI, and Back. To simulate noise, the first five classes are considered ID classes, while the last three are regarded as OOD classes. Each sample is annotated with a green tag indicating its true class. A orange box around an image denotes a mislabeled sample.
Figure 2: Illustration of feature distributions for both ID and OOD test data from the NoisyKather5k dataset. Sub-figures (a), (b), and (c) depict image features extracted by models trained with clean data, closed-set noisy data, and open-set noisy data, respectively. The closed-set classes consist of TUM, s-STR, c-STR, LYM, and Norm, while the open-set classes comprise DEB, ADI, and Back.
Figure 3: Comparison between (a) a representative previous framework and (b) our ENCOFA framework designed to address mixed open-set and closed-set label noise. The previous framework distinguishes among clean samples, closed-set noisy samples, and open-set noisy samples. It trains the model using clean samples and closed-set noisy samples. In contrast, our ENCOFA framework enhances the model by introducing an extended noise-robust supervised contrastive loss $L_{ensc}$ alongside the classification loss $L_{cls}$. This approach significantly improves the identification accuracy of open-set noisy samples. Furthermore, ENCOFA utilizes detected open-set noisy samples to enrich them through the OSFeatAug module. The enriched open-set features are assigned dynamic labels, effectively consuming the model’s additional capacity and thereby preventing overfitting.
Figure 4: Illustration of the ENCOFA framework. Samples are classified into clean, closed-set noisy, or open-set noisy categories using a noise type identifier, and then processed through an encoder and a FC layer. Open-set noisy sample features are augmented using the OSFeatAug module. Classification losses for clean, closed-set noisy, and open-set noisy samples are computed based on their observed, pseudo, or random labels, respectively. Meanwhile, the output features from the projector are utilized to calculate the ENSC loss.
Figure 5: Performance comparison of ENCOFA variants with respect to (a) $\gamma_{CL}$, (b) $\gamma_{OOD}$, and (c) $\lambda$, evaluated on the NoisyKather5k dataset ($\alpha$=0.4, $\beta$=0.25). The variant '$\mathcal{L}_{cls}^{CL}+\mathcal{L}_{cls}^{CN}$' identifies noisy samples within the training data and utilizes both clean and noisy data to train the classification backbone, supervised by observed and pseudo labels, respectively. The variant '$\mathcal{L}_{cls}^{CL}+\mathcal{L}_{cls}^{CN}+\mathcal{L}_{cls}^{ON}$' detects closed-set and open-set noisy samples across all training data, with these two types of noisy data supervised by pseudo and dynamic labels, respectively. The variant '$\mathcal{L}_{cls}+\mathcal{L}_{ensc}$' integrates the extended noise-robust supervised contrastive loss, building upon '$\mathcal{L}_{cls}^{CL}+\mathcal{L}_{cls}^{CN}+\mathcal{L}_{cls}^{ON}$'.
...and 1 more figures

Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

TL;DR

Abstract

Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)