Table of Contents
Fetching ...

Structure and Progress Aware Diffusion for Medical Image Segmentation

Siyuan Song, Guyue Hu, Chenglong Li, Dengdi Sun, Zhe Jin, Jin Tang

TL;DR

This paper proposes a structure and progress-aware diffusion (SPAD) for medical image segmentation, which consists of a semantic-concentrated diffusion (ScD) and a boundary-centralized diffusion (BcD) modulated by a progress-aware scheduler (PaS).

Abstract

Medical image segmentation is crucial for computer-aided diagnosis, which necessitates understanding both coarse morphological and semantic structures, as well as carving fine boundaries. The morphological and semantic structures in medical images are beneficial and stable clues for target understanding. While the fine boundaries of medical targets (like tumors and lesions) are usually ambiguous and noisy since lesion overlap, annotation uncertainty, and so on, making it not reliable to serve as early supervision. However, existing methods simultaneously learn coarse structures and fine boundaries throughout the training process. In this paper, we propose a structure and progress-aware diffusion (SPAD) for medical image segmentation, which consists of a semantic-concentrated diffusion (ScD) and a boundary-centralized diffusion (BcD) modulated by a progress-aware scheduler (PaS). Specifically, the semantic-concentrated diffusion introduces anchor-preserved target perturbation, which perturbs pixels within a medical target but preserves unaltered areas as semantic anchors, encouraging the model to infer noisy target areas from the surrounding semantic context. The boundary-centralized diffusion introduces progress-aware boundary noise, which blurs unreliable and ambiguous boundaries, thus compelling the model to focus on coarse but stable anatomical morphology and global semantics. Furthermore, the progress-aware scheduler gradually modulates noise intensity of the ScD and BcD forming a coarse-to-fine diffusion paradigm, which encourage focusing on coarse morphological and semantic structures during early target understanding stages and gradually shifting to fine target boundaries during later contour adjusting stages.

Structure and Progress Aware Diffusion for Medical Image Segmentation

TL;DR

This paper proposes a structure and progress-aware diffusion (SPAD) for medical image segmentation, which consists of a semantic-concentrated diffusion (ScD) and a boundary-centralized diffusion (BcD) modulated by a progress-aware scheduler (PaS).

Abstract

Medical image segmentation is crucial for computer-aided diagnosis, which necessitates understanding both coarse morphological and semantic structures, as well as carving fine boundaries. The morphological and semantic structures in medical images are beneficial and stable clues for target understanding. While the fine boundaries of medical targets (like tumors and lesions) are usually ambiguous and noisy since lesion overlap, annotation uncertainty, and so on, making it not reliable to serve as early supervision. However, existing methods simultaneously learn coarse structures and fine boundaries throughout the training process. In this paper, we propose a structure and progress-aware diffusion (SPAD) for medical image segmentation, which consists of a semantic-concentrated diffusion (ScD) and a boundary-centralized diffusion (BcD) modulated by a progress-aware scheduler (PaS). Specifically, the semantic-concentrated diffusion introduces anchor-preserved target perturbation, which perturbs pixels within a medical target but preserves unaltered areas as semantic anchors, encouraging the model to infer noisy target areas from the surrounding semantic context. The boundary-centralized diffusion introduces progress-aware boundary noise, which blurs unreliable and ambiguous boundaries, thus compelling the model to focus on coarse but stable anatomical morphology and global semantics. Furthermore, the progress-aware scheduler gradually modulates noise intensity of the ScD and BcD forming a coarse-to-fine diffusion paradigm, which encourage focusing on coarse morphological and semantic structures during early target understanding stages and gradually shifting to fine target boundaries during later contour adjusting stages.
Paper Structure (23 sections, 7 equations, 6 figures, 4 tables)

This paper contains 23 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) Beneficial and stable morphological and semantic structures. (b) Ambiguous and unreliable medical boundaries. (c) Conventional training paradigm simultaneously learns coarse structures and fine boundaries throughout the whole training process. (d) The proposed coarse-to-fine paradigm prioritizes acquiring coarse and stable structures and gradually shifts to carving fine and unreliable boundaries in medical images.
  • Figure 2: Overview of the proposed structure and progress-aware diffusion (SPAD) framework. The conditional diffusion model progressively denoises the final segmentation prediction $\textbf{x}_0$ from a random noise $\textbf{x}_T$ (initial noise input) and the perturbed segmentation image (progress-aware noise condition) step by step. The progress-aware noise condition is generated from the combination of boundary-centered perturbing and semantic-concentrated perturbing. At each step, the perturbing intensities are dynamically modulated by a progress-aware scheduler (PaS), enabling a progress-aware coordination of the coarse structures and fine boundaries in medical segmentation targets.
  • Figure 3: Illustration of the semantic-concentrated diffusion (ScD), which perturbs pixels within a specific medical target with progress-aware noise but preserves small unaltered areas as semantic anchors, encouraging structure-guided reasoning and context-based reconstruction.
  • Figure 4: Illustration of the boundary-centralized diffusion (BcD). It first extracts the target boundary with a contour detector (such as the Canny operator) from the ground-truth mask label, and then blurs the unreliable and ambiguous boundary with Gaussian noise to lower the reliance on uncertain boundaries during early target understanding stages.
  • Figure 5: Qualitative visualization of segmentation results on the AMD-SD dataset. From bottom to top: ground truth (GT), segmentation mask predicted from our SPAD, the SPAD without ScD (SPAD w/o ScD), the SPAD without BcD (SPAD w/o BcD), and the diffusion baseline without both ScD and BcD (Baseline). (1) The white boxes in the left four columns indicate representative regions where the baseline model fails to correctly localize target structures or produces incorrect predictions. Compared with the baseline, the methods incorporating ScD exhibit improved structural localization in these regions. (2) The green boxes in the right four columns indicate boundary-sensitive regions where the baseline produces imprecise or irregular contours. Compared with the baseline, the methods incorporating BcD yield more accurate boundary delineation. (3) When both ScD and BcD are incorporated, the predictions combine the advantages of both components and achieve the best.
  • ...and 1 more figures