Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

Cheng Jin; Zhengrui Guo; Yi Lin; Luyang Luo; Hao Chen

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

Cheng Jin, Zhengrui Guo, Yi Lin, Luyang Luo, Hao Chen

TL;DR

This survey organizes label-efficient learning in medical image analysis into four annotation scenarios (no label, insufficient label, inexact label, and label refinement) and surveys 350+ studies across imaging modalities. It foregrounds how self-supervised, semi-supervised, weakly supervised, and active learning strategies can reduce annotation burdens, with HFMs playing a central role in transfer and pretraining. The authors synthesize representative methods (reconstruction, context, contrastive, MIL, pseudo-labeling, generative modeling, and regularization) and discuss challenges in generalization, benchmarking, and clinical deployment. They also outline future directions, including health foundation models, HITL, generative augmentation, federated learning, and standardized evaluation pipelines to accelerate translation to clinical practice.

Abstract

Deep learning has significantly advanced medical imaging analysis (MIA), achieving state-of-the-art performance across diverse clinical tasks. However, its success largely depends on large-scale, high-quality labeled datasets, which are costly and time-consuming to obtain due to the need for expert annotation. To mitigate this limitation, label-efficient deep learning methods have emerged to improve model performance under limited supervision by leveraging labeled, unlabeled, and weakly labeled data. In this survey, we systematically review over 350 peer-reviewed studies and present a comprehensive taxonomy of label-efficient learning methods in MIA. These methods are categorized into four labeling paradigms: no label, insufficient label, inexact label, and label refinement. For each category, we analyze representative techniques across imaging modalities and clinical applications, highlighting shared methodological principles and task-specific adaptations. We also examine the growing role of health foundation models (HFMs) in enabling label-efficient learning through large-scale pre-training and transfer learning, enhancing the use of limited annotations in downstream tasks. Finally, we identify current challenges and future directions to facilitate the translation of label-efficient learning from research promise to everyday clinical care.

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

TL;DR

Abstract

Paper Structure (39 sections, 6 figures, 15 tables)

This paper contains 39 sections, 6 figures, 15 tables.

Introduction
No Label
Reconstruction-Based Methods
Context-Based Methods
Contrastive-Based Methods
Discussion
Insufficient Label
Proxy-labeling Methods
Self-training Methods
Multi-view learning methods
Generative Modeling Methods
Regularization-based Methods
Discussion
Inexact Label
Annotation-Efficient Learning
...and 24 more sections

Figures (6)

Figure 1: Overview of this survey. This survey categorizes approaches based on four labeling scenarios: No label (Section \ref{['sec:ssl']}), insufficient label (Section \ref{['sec:semi']}), inexact label (Section \ref{['sec:mil']}), and label refinement (Section \ref{['sec:al']}). This figure illustrates the disparity between data growth and annotator scarcity, the core techniques employed in each scenario, and trends in label-efficient learning applications. Detailed survey scope can be referred to Appendix \ref{['appendix1']}.
Figure 2: Overview of self-supervised learning (Self-SL) paradigm which addresses the no label scenario. Self-SL aims to learn a pre-trained model by developing various proxy tasks based solely on unlabeled data. Then the pre-trained model can be fine-tuned on different downstream tasks with labeled datasets.
Figure 3: Overview of Semi-supervised learning (Semi-SL) paradigm which addresses the insufficient label scenario. Semi-SL leverages a small amount of labeled data and a large amount of unlabeled data to jointly train a model. The goal is to ensure prediction consistency between labeled and unlabeled examples, typically by optimizing a combination of supervised ($\mathcal{L}_{sup}$) and unsupervised ($\mathcal{L}_{unsup}$) loss functions.
Figure 4: Overview of weakly-supervised learning paradigm which addresses the inexact label scenario. On the annotation side, annotation-efficient learning strategy aims to maximize the utility of limited or partially annotated data while minimizing annotation burden. On the task side, multi-instance learning strategy aims to bridge the label granularity gap by inferring fine-grained predictions from coarse supervision.
Figure 5: Overview of the active learning (AL) paradigm, which addresses the label refinement scenario. AL iteratively trains a model by employing informative sampling strategies to select and annotate the most valuable unlabeled samples, thereby maximizing model performance under a limited annotation budget.
...and 1 more figures

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

TL;DR

Abstract

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Figures (6)