How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

Bidur Khanal; Binod Bhattarai; Bishesh Khanal; Cristian Linte

How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Cristian Linte

TL;DR

This work addresses robustness of medical image classification under label noise by proposing a two-stage pipeline: self-supervised pretraining followed by supervised learning with learning-with-noisy-label (LNL) techniques. It comprehensively evaluates eight self-supervised methods and two LNL strategies across five diverse datasets, introducing dataset-specific difficulty ranking via Fisher's CSS and systematic class-grouping. The key finding is that contrastive self-supervised pretraining most consistently enhances robustness to noisy labels, with DermNet being the most challenging yet displaying notable noise resilience; gains are larger under symmetrical noise and diminish under class-dependent noise. The results offer practical guidance on selecting SSL objectives and demonstrate how dataset properties influence robustness, informing deployment of SSL-LNL approaches in medical imaging.

Abstract

Noisy labels can significantly impact medical image classification, particularly in deep learning, by corrupting learned features. Self-supervised pretraining, which doesn't rely on labeled data, can enhance robustness against noisy labels. However, this robustness varies based on factors like the number of classes, dataset complexity, and training size. In medical images, subtle inter-class differences and modality-specific characteristics add complexity. Previous research hasn't comprehensively explored the interplay between self-supervised learning and robustness against noisy labels in medical image classification, considering all these factors. In this study, we address three key questions: i) How does label noise impact various medical image classification datasets? ii) Which types of medical image datasets are more challenging to learn and more affected by label noise? iii) How do different self-supervised pretraining methods enhance robustness across various medical image datasets? Our results show that DermNet, among five datasets (Fetal plane, DermNet, COVID-DU-Ex, MURA, NCT-CRC-HE-100K), is the most challenging but exhibits greater robustness against noisy labels. Additionally, contrastive learning stands out among the eight self-supervised methods as the most effective approach to enhance robustness against noisy labels.

How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

TL;DR

Abstract

Paper Structure (27 sections, 2 equations, 6 figures, 11 tables)

This paper contains 27 sections, 2 equations, 6 figures, 11 tables.

Introduction
Related Works on Learning with Noisy Labels in Medical Image Classification
Method
Problem Setup
Label Noise Injection
Impact of Noisy Labels
Assess Dataset Difficulty:
Robustness to Label Noise
Proposed Pipeline
Self-supervised Pretraining
Supervised Learning with LNL
Datasets
Experiments
Implementation Details for Problem Setup
Creating Joint Feature Space
...and 12 more sections

Figures (6)

Figure 1: The assessment of test performance relative to training label noise rate (noise probability). BEST represents peak performance, and LAST denotes the average over the last five epochs. The blue region indicates noise rates below the flipping threshold, while the red region signifies rates above the threshold.
Figure 5: Comparison of the robustness score across various datasets. The higher value of R denotes relatively greater robustness against noisy labels. R for $\epsilon \leq 0.4$ decipts that the robustness score was computed for label noise range 0,0.4.
Figure 6: The overall pipeline consists of two stages: a) self-supervised pretraining to learn a robust feature extractor $G_\theta$, and b) supervised training on noisy labels to build a robust classifier ($G_\theta$; $F_\alpha$). During self-supervised learning, there is no use of the provided labels $Y$; instead, it relies on self-generated pseudo labels $\hat{Y}$. We explore various self-supervised learning objectives based on pretext tasks, contrastive learning, and generative methods. Supervised training employs the LNL method, which robustly trains the classifier using the noisy labels $Y$.
Figure 11: Comparing the test performance of Co-teaching (CT) with various self-supervised pretraining (plastic backbone) at different class-dependent label noise rates, across five datasets. CE stands for standard cross-entropy. LAST and BEST show the best performance and average of the last five epochs, respectively in the test set.
Figure 12: Comparing the test performance of Dividemix (DM) with various self-supervised pretraining (plastic backbone) at different class-dependent label noise rates, across five datasets. CE stands for standard cross-entropy. LAST and BEST show the best performance and average of the last five epochs, respectively in the test set.
...and 1 more figures

How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

TL;DR

Abstract

How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

Authors

TL;DR

Abstract

Table of Contents

Figures (6)