Table of Contents
Fetching ...

Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise

Bidur Khanal, Tianhong Dai, Binod Bhattarai, Cristian Linte

TL;DR

This approach improves the robustness of medical image classification in the presence of noisy labels but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget.

Abstract

The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incorporating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning. This approach not only improves the robustness of medical image classification in the presence of noisy labels, but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in LNL phase, which complements the loss-based sample selection by also sampling under-represented samples. Using two imbalanced noisy medical classification datasets, we demonstrate that that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.

Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise

TL;DR

This approach improves the robustness of medical image classification in the presence of noisy labels but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget.

Abstract

The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incorporating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning. This approach not only improves the robustness of medical image classification in the presence of noisy labels, but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in LNL phase, which complements the loss-based sample selection by also sampling under-represented samples. Using two imbalanced noisy medical classification datasets, we demonstrate that that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.
Paper Structure (18 sections, 1 equation, 8 figures, 1 table)

This paper contains 18 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: Active Label Cleaning Pipeline: 1) Learning with Noisy Labels (LNL), where the clean-noisy selection process includes selections from both small Variance of Gradient (VOG) and small loss ($\mathcal{L}$) criteria; 2) Active Label Cleaning, wherein the noisy samples discarded by LNL are iteratively sampled using an active sampler ($\mathbf{\Phi}$) and relabeled.
  • Figure 2: Comparison of the macro-averaged test F1-score across various baselines in ISIC-2019 dataset at two noise rates : $p = 0.4$ (left) and $p =0.5$ (right).
  • Figure 3: Comparison of the macro-averaged test F1-score across various baselines in Long-tailed NCT-CRC-HE-100K dataset at two noise rates: $p = 0.7$ (left) and $p =0.8$ (right).
  • Figure 4: Distribution of the classes in ISIC-2019 Dataset
  • Figure 5: Distribution of the classes in long-tailed NCT-CRC-HE-100K Dataset
  • ...and 3 more figures