Table of Contents
Fetching ...

Semi-Supervised Biomedical Image Segmentation via Diffusion Models and Teacher-Student Co-Training

Luca Ciampi, Gabriele Lagani, Giuseppe Amato, Fabrizio Falchi

TL;DR

The paper tackles label-efficient biomedical image segmentation by introducing a diffusion-inspired semi-supervised framework that fuses a two-pathway, unsupervised teacher pretraining with a cross pseudo-supervision teacher–student co-training loop. It further strengthens pseudo-label quality through multi-round diffusion-based refinements, yielding strong performance on multiple 2D datasets (GlaS, PH2, HMEPS) and a 3D MRI dataset (LA) under limited annotations. Key contributions include cycle-consistency-based unsupervised teacher pretraining, CPS-driven semi-supervised co-training, and iterative diffusion rounds with alignment and reconstruction losses, all validated against SOTA baselines. The method improves data efficiency in medical image segmentation and demonstrates robust applicability to both 2D and 3D data, with code provided for reproducibility.

Abstract

Supervised deep learning for semantic segmentation has achieved excellent results in accurately identifying anatomical and pathological structures in medical images. However, it often requires large annotated training datasets, which limits its scalability in clinical settings. To address this challenge, semi-supervised learning is a well-established approach that leverages both labeled and unlabeled data. In this paper, we introduce a novel semi-supervised teacher-student framework for biomedical image segmentation, inspired by the recent success of generative models. Our approach leverages denoising diffusion probabilistic models (DDPMs) to generate segmentation masks by progressively refining noisy inputs conditioned on the corresponding images. The teacher model is first trained in an unsupervised manner using a cycle-consistency constraint based on noise-corrupted image reconstruction, enabling it to generate informative semantic masks. Subsequently, the teacher is integrated into a co-training process with a twin-student network. The student learns from ground-truth labels when available and from teacher-generated pseudo-labels otherwise, while the teacher continuously improves its pseudo-labeling capabilities. Finally, to further enhance performance, we introduce a multi-round pseudo-label generation strategy that iteratively improves the pseudo-labeling process. We evaluate our approach on multiple biomedical imaging benchmarks, spanning multiple imaging modalities and segmentation tasks. Experimental results show that our method consistently outperforms state-of-the-art semi-supervised techniques, highlighting its effectiveness in scenarios with limited annotated data. The code to replicate our experiments can be found at https://github.com/ciampluca/diffusion_semi_supervised_biomedical_image_segmentation

Semi-Supervised Biomedical Image Segmentation via Diffusion Models and Teacher-Student Co-Training

TL;DR

The paper tackles label-efficient biomedical image segmentation by introducing a diffusion-inspired semi-supervised framework that fuses a two-pathway, unsupervised teacher pretraining with a cross pseudo-supervision teacher–student co-training loop. It further strengthens pseudo-label quality through multi-round diffusion-based refinements, yielding strong performance on multiple 2D datasets (GlaS, PH2, HMEPS) and a 3D MRI dataset (LA) under limited annotations. Key contributions include cycle-consistency-based unsupervised teacher pretraining, CPS-driven semi-supervised co-training, and iterative diffusion rounds with alignment and reconstruction losses, all validated against SOTA baselines. The method improves data efficiency in medical image segmentation and demonstrates robust applicability to both 2D and 3D data, with code provided for reproducibility.

Abstract

Supervised deep learning for semantic segmentation has achieved excellent results in accurately identifying anatomical and pathological structures in medical images. However, it often requires large annotated training datasets, which limits its scalability in clinical settings. To address this challenge, semi-supervised learning is a well-established approach that leverages both labeled and unlabeled data. In this paper, we introduce a novel semi-supervised teacher-student framework for biomedical image segmentation, inspired by the recent success of generative models. Our approach leverages denoising diffusion probabilistic models (DDPMs) to generate segmentation masks by progressively refining noisy inputs conditioned on the corresponding images. The teacher model is first trained in an unsupervised manner using a cycle-consistency constraint based on noise-corrupted image reconstruction, enabling it to generate informative semantic masks. Subsequently, the teacher is integrated into a co-training process with a twin-student network. The student learns from ground-truth labels when available and from teacher-generated pseudo-labels otherwise, while the teacher continuously improves its pseudo-labeling capabilities. Finally, to further enhance performance, we introduce a multi-round pseudo-label generation strategy that iteratively improves the pseudo-labeling process. We evaluate our approach on multiple biomedical imaging benchmarks, spanning multiple imaging modalities and segmentation tasks. Experimental results show that our method consistently outperforms state-of-the-art semi-supervised techniques, highlighting its effectiveness in scenarios with limited annotated data. The code to replicate our experiments can be found at https://github.com/ciampluca/diffusion_semi_supervised_biomedical_image_segmentation

Paper Structure

This paper contains 20 sections, 17 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the proposed teacher-student architecture for semi-supervised biomedical segmentation. On the right, we depict the student-teacher co-training process, where both models are jointly optimized. Specifically, they are both based on a UNet architecture and draw inspiration from denoising diffusion probabilistic models (DDPMs), learning to generate semantic segmentation masks by starting from a noise vector conditioned on an input image. When ground-truth labels are available, both models are trained using a standard cross-entropy loss between predictions and the ground truth. In the absence of annotations, co-training is guided by cross pseudo-supervision (CPS), where the predictions of the teacher serve as pseudo-labels. To ensure that the teacher produces informative pseudo-labels, it undergoes a preliminary unsupervised training phase (shown on the left). This training follows a dual-pathway approach: first, it generates a segmentation mask from a noise vector conditioned on an input image, then reconstructs the original image using the generated mask and a noise-corrupted version of the input.
  • Figure 2: Overview of the unsupervised teacher pretraining. As with standard denoising diffusion methods, the architecture is based on a UNet model. However, we introduce two alternating computational pathways: (i) a mask pathway (top) and (ii) an image pathway (bottom). In the mask pathway, the network processes a noise-corrupted segmentation mask, concatenated with a clean input image, and aims to generate a noise-free mask. In the image pathway, the network receives a noise-corrupted image, concatenated with a clean mask, and is tasked with predicting the initial noise added to the image. Finally, the original image is then reconstructed following the approach in DBLP:conf/nips/HoJA20. This reconstructed image is used to compute a cycle-consistency loss, allowing the teacher to generate meaningful masks conditioned on image samples.
  • Figure 3: Qualitative results from our semi-supervised approach in a 20% label scarcity setting. Each row corresponds to a different dataset we considered in our experimental evaluation; columns include a four-tuple sample--pseudo-label--predition--target.
  • Figure 4: Ablation on the effect of teacher setup in SuperDiffusion. Results on various datasets under different degrees of label availability.
  • Figure 5: Ablation on the number of diffusion rounds $R$ in our approach. Results are reported on various datasets under different levels of label availability.