Table of Contents
Fetching ...

Semi-Supervised Disease Classification based on Limited Medical Image Data

Yan Zhang, Chun Li, Zhaoxia Liu, Ming Li

TL;DR

This work tackles semi-supervised disease classification under PU learning where labeled medical images are scarce. It introduces HD-PAN, an adversarial framework that replaces the GAN generator with a classifier and leverages Hölder divergence to train on positive and unlabeled data, eliminating reliance on prior class probabilities. Through extensive experiments on five medical-imaging benchmarks, HD-PAN achieves state-of-the-art performance and demonstrates robust sensitivity to the Hölder parameter, offering improved disease-detection capabilities while leveraging abundant unlabeled data. The approach has practical potential to reduce annotation burden in medical image analysis and support more scalable, data-efficient diagnosis pipelines.

Abstract

In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medical image-aided diagnosis algorithms, numerous theoretical and practical obstacles persist. The research on PU learning for medical image-assisted diagnosis holds substantial importance, as it aims to reduce the time spent by professional experts in classifying images. Unlike natural images, medical images are typically accompanied by a scarcity of annotated data, while an abundance of unlabeled cases exists. Addressing these challenges, this paper introduces a novel generative model inspired by Hölder divergence, specifically designed for semi-supervised disease classification using positive and unlabeled medical image data. In this paper, we present a comprehensive formulation of the problem and establish its theoretical feasibility through rigorous mathematical analysis. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on five benchmark datasets commonly used in PU medical learning: BreastMNIST, PneumoniaMNIST, BloodMNIST, OCTMNIST, and AMD. The experimental results clearly demonstrate the superiority of our method over existing approaches based on KL divergence. Notably, our approach achieves state-of-the-art performance on all five disease classification benchmarks. By addressing the limitations imposed by limited labeled data and harnessing the untapped potential of unlabeled medical images, our novel generative model presents a promising direction for enhancing semi-supervised disease classification in the field of medical image analysis.

Semi-Supervised Disease Classification based on Limited Medical Image Data

TL;DR

This work tackles semi-supervised disease classification under PU learning where labeled medical images are scarce. It introduces HD-PAN, an adversarial framework that replaces the GAN generator with a classifier and leverages Hölder divergence to train on positive and unlabeled data, eliminating reliance on prior class probabilities. Through extensive experiments on five medical-imaging benchmarks, HD-PAN achieves state-of-the-art performance and demonstrates robust sensitivity to the Hölder parameter, offering improved disease-detection capabilities while leveraging abundant unlabeled data. The approach has practical potential to reduce annotation burden in medical image analysis and support more scalable, data-efficient diagnosis pipelines.

Abstract

In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medical image-aided diagnosis algorithms, numerous theoretical and practical obstacles persist. The research on PU learning for medical image-assisted diagnosis holds substantial importance, as it aims to reduce the time spent by professional experts in classifying images. Unlike natural images, medical images are typically accompanied by a scarcity of annotated data, while an abundance of unlabeled cases exists. Addressing these challenges, this paper introduces a novel generative model inspired by Hölder divergence, specifically designed for semi-supervised disease classification using positive and unlabeled medical image data. In this paper, we present a comprehensive formulation of the problem and establish its theoretical feasibility through rigorous mathematical analysis. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on five benchmark datasets commonly used in PU medical learning: BreastMNIST, PneumoniaMNIST, BloodMNIST, OCTMNIST, and AMD. The experimental results clearly demonstrate the superiority of our method over existing approaches based on KL divergence. Notably, our approach achieves state-of-the-art performance on all five disease classification benchmarks. By addressing the limitations imposed by limited labeled data and harnessing the untapped potential of unlabeled medical images, our novel generative model presents a promising direction for enhancing semi-supervised disease classification in the field of medical image analysis.
Paper Structure (27 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The Framework of HD-PAN. HD-PAN presents a comparison between two models, $D(\cdot)$ and $C(\cdot)$. The classifier $C(\cdot)$ aims to assess the probability of the unlabeled data from set $U$ being positive data, subsequently forwarding the data to the discriminator $D(\cdot)$ to determine its authenticity as positive data.
  • Figure 2: Visualization of Quantitative Evaluation: The figure presents a comprehensive quantitative evaluation based on inter-class and intra-class experimental results conducted on two medical image datasets, namely BreastMNIST and PneumoniaMNIST. Subfigures (a) and (b) showcase the inter-class experimental results of BreastMNIST, while (c) and (d) illustrate the intra-class experimental outcomes for the same dataset. Additionally, subfigures (e) and (f) depict the inter-class experimental results for PneumoniaMNIST, and (g) and (h) present the intra-class experimental results for PneumoniaMNIST.
  • Figure 3: Visualization of Quantitative Evaluation: The figure presents a detailed quantitative evaluation through inter-class and intra-class experimental results conducted on two medical image datasets, OCTMNIST and BloodMNIST. Subfigures (a) and (b) illustrate the inter-class experimental outcomes for OCTMNIST, while (c) and (d) showcase the intra-class experimental results for the same dataset. Furthermore, subfigures (e) and (f) depict the inter-class experimental results for BloodMNIST, and (g) and (h) showcase the intra-class experimental outcomes for BloodMNIST, respectively.
  • Figure 4: Quantitative Evaluation Visualization: The figure illustrates the intra-class and inter-class experimental outcomes for the AMD dataset. Subfigures (a) and (b) depict the inter-class experimental results, while subfigures (c) and (d) showcase the intra-class experimental outcomes.
  • Figure 5: AMD and Non-AMD Examples: Panels (a)-(c) showcase examples of AMD, while panels (d)-(f) depict examples of Non-AMD.
  • ...and 1 more figures