Table of Contents
Fetching ...

Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation

Yuran Wang, Zhijing Wan, Yansheng Qiu, Zheng Wang

TL;DR

This work tackles the scarcity of high-quality unlabeled data in SSL for medical imaging by presenting Locality-Aware Diffusion (Lad), a three-phase framework that generates high-fidelity 3D abdominal CT volumes. Lad constructs a locality-focused latent space using VQ-GAN, fits a diffusion model in that space guided by anatomical priors extracted from predicted organ masks, and samples with augmented locality conditions to produce large diverse synthetic datasets. Key contributions include a Locality Loss that emphasizes fine-grained abdominal structures, a dual-content/structure Condition Extractor based on Betti-topology features, and locality-conditioned sampling with classifier-free guidance. Experimental results on AbdomenCT-1K and TotalSegmentator show Lad achieves state-of-the-art realism and diversity, and substantially improves self-supervised organ segmentation performance, especially for small organs like the pancreas and spleen, highlighting the practical impact of synthetic data for SSL in medical imaging.

Abstract

In the realm of medical image analysis, self-supervised learning (SSL) techniques have emerged to alleviate labeling demands, while still facing the challenge of training data scarcity owing to escalating resource requirements and privacy constraints. Numerous efforts employ generative models to generate high-fidelity, unlabeled 3D volumes across diverse modalities and anatomical regions. However, the intricate and indistinguishable anatomical structures within the abdomen pose a unique challenge to abdominal CT volume generation compared to other anatomical regions. To address the overlooked challenge, we introduce the Locality-Aware Diffusion (Lad), a novel method tailored for exquisite 3D abdominal CT volume generation. We design a locality loss to refine crucial anatomical regions and devise a condition extractor to integrate abdominal priori into generation, thereby enabling the generation of large quantities of high-quality abdominal CT volumes essential for SSL tasks without the need for additional data such as labels or radiology reports. Volumes generated through our method demonstrate remarkable fidelity in reproducing abdominal structures, achieving a decrease in FID score from 0.0034 to 0.0002 on AbdomenCT-1K dataset, closely mirroring authentic data and surpassing current methods. Extensive experiments demonstrate the effectiveness of our method in self-supervised organ segmentation tasks, resulting in an improvement in mean Dice scores on two abdominal datasets effectively. These results underscore the potential of synthetic data to advance self-supervised learning in medical image analysis.

Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation

TL;DR

This work tackles the scarcity of high-quality unlabeled data in SSL for medical imaging by presenting Locality-Aware Diffusion (Lad), a three-phase framework that generates high-fidelity 3D abdominal CT volumes. Lad constructs a locality-focused latent space using VQ-GAN, fits a diffusion model in that space guided by anatomical priors extracted from predicted organ masks, and samples with augmented locality conditions to produce large diverse synthetic datasets. Key contributions include a Locality Loss that emphasizes fine-grained abdominal structures, a dual-content/structure Condition Extractor based on Betti-topology features, and locality-conditioned sampling with classifier-free guidance. Experimental results on AbdomenCT-1K and TotalSegmentator show Lad achieves state-of-the-art realism and diversity, and substantially improves self-supervised organ segmentation performance, especially for small organs like the pancreas and spleen, highlighting the practical impact of synthetic data for SSL in medical imaging.

Abstract

In the realm of medical image analysis, self-supervised learning (SSL) techniques have emerged to alleviate labeling demands, while still facing the challenge of training data scarcity owing to escalating resource requirements and privacy constraints. Numerous efforts employ generative models to generate high-fidelity, unlabeled 3D volumes across diverse modalities and anatomical regions. However, the intricate and indistinguishable anatomical structures within the abdomen pose a unique challenge to abdominal CT volume generation compared to other anatomical regions. To address the overlooked challenge, we introduce the Locality-Aware Diffusion (Lad), a novel method tailored for exquisite 3D abdominal CT volume generation. We design a locality loss to refine crucial anatomical regions and devise a condition extractor to integrate abdominal priori into generation, thereby enabling the generation of large quantities of high-quality abdominal CT volumes essential for SSL tasks without the need for additional data such as labels or radiology reports. Volumes generated through our method demonstrate remarkable fidelity in reproducing abdominal structures, achieving a decrease in FID score from 0.0034 to 0.0002 on AbdomenCT-1K dataset, closely mirroring authentic data and surpassing current methods. Extensive experiments demonstrate the effectiveness of our method in self-supervised organ segmentation tasks, resulting in an improvement in mean Dice scores on two abdominal datasets effectively. These results underscore the potential of synthetic data to advance self-supervised learning in medical image analysis.
Paper Structure (36 sections, 4 equations, 6 figures, 3 tables)

This paper contains 36 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our Motivation. (a) Improvements in Dice score for the self-supervised segmentation model SSL-ALPNet ouyang2022self trained on the augmented AbdomenCT-1K dataset (augmented with volumes synthesized by Medical Diffusion khader2023denoising) over those trained on the original AbdomenCT-1K dataset. Despite the synthetic data augmentation, there is a notable decrease in SSL performance, indicating the limited utility of sub-optimal synthetic abdominal CT data. (b) Comparison of real and synthetic abdominal CT images. The outline of the pancreas in the real image is much more discriminative than in the synthetic image synthesized by Medical Diffusion.
  • Figure 2: Overview of Locality-Aware Diffusion (Lad). Lad consists of three phases: (a) Latent space construction. We introduce a locality refinement module into the VQ-GAN training. Refinement module uses the Locality Loss $\mathcal{L}_{loc}$ to facilitate VQ-GAN to learn more details of anatomical structures in the low-dimensional space. (b) Diffusion fitting in latent space. We introduce a locality condition extraction module into the diffusion training. The diffusion model incorporates a locality condition extracted by Condition Extractor $E_{c}$ to generate volumes. (c) Sampling in latent space. We introduce a locality condition augmentation module into the diffusion sampling. In the augmentation module, diverse conditions are extracted from the augmented maskset and used to generate massive volumes. All three phases use the masks output by a priori extraction module. Masks are predicted by a well-trained universal segmentation model UniverSeg, with masks considered as priori.
  • Figure 3: Locality Condition Extraction. (a) This module extracts features from priori to guide the generation precisely, in the joint action of two complementary sub-modules, i.e., Content Extractor $E_{con}$ and Structure Extractor $E_{str}$. (b) Structure Extractor $E_{str}$ extracts topological features of the slice $y^i_j$ of mask $y^i$ into a one-dimension vector of length 6 according to the Betti numbers of each label.
  • Figure 4: Comparison of Synthetic Volumes Embedding From Different Methods. Features extracted from synthetic volumes are embedded into a 2-dimensional space using MDS, with ellipses fitted to method-specific scatter plots for improved clarity. Both (a) and (b) show that Lad exhibits the highest overlap with real volumes.
  • Figure 5: Visualization of Volumes Generated by Different Methods. The first three columns display the 8th, 16th, 24th slices of synthetic volumes on AbdomenCT-1K dataset, while the last three columns display those on TotalSegmentator dataset. Samples of each method are presented in two rows: the first row depicts the entire slice, while the second row focuses on a local area of the image. Lad produces clearer anatomical structures compared to other methods.
  • ...and 1 more figures