Table of Contents
Fetching ...

Diffusion-Based Quality Control of Medical Image Segmentations across Organs

Vincenzo Marcianò, Hava Chaptoukaev, Virginia Fernandez, M. Jorge Cardoso, Sébastien Ourselin, Michela Antonelli, Maria A. Zuluaga

Abstract

Medical image segmentation using deep learning (DL) has enabled the development of automated analysis pipelines for large-scale population studies. However, state-of-the-art DL methods are prone to hallucinations, which can result in anatomically implausible segmentations. With manual correction impractical at scale, automated quality control (QC) techniques have to address the challenge. While promising, existing QC methods are organ-specific, limiting their generalizability and usability beyond their original intended task. To overcome this limitation, we propose no-new Quality Control (nnQC), a robust QC framework based on a diffusion-generative paradigm that self-adapts to any input organ dataset. Central to nnQC is a novel Team of Experts (ToE) architecture, where two specialized experts independently encode 3D spatial awareness, represented by the relative spatial position of an axial slice, and anatomical information derived from visual features from the original image. A weighted conditional module dynamically combines the pair of independent embeddings, or opinions to condition the sampling mechanism within a diffusion process, enabling the generation of a spatially aware pseudo-ground truth for predicting QC scores. Within its framework, nnQC integrates fingerprint adaptation to ensure adaptability across organs, datasets, and imaging modalities. We evaluated nnQC on seven organs using twelve publicly available datasets. Our results demonstrate that nnQC consistently outperforms state-of-the-art methods across all experiments, including cases where segmentation masks are highly degraded or completely missing, confirming its versatility and effectiveness across different organs.

Diffusion-Based Quality Control of Medical Image Segmentations across Organs

Abstract

Medical image segmentation using deep learning (DL) has enabled the development of automated analysis pipelines for large-scale population studies. However, state-of-the-art DL methods are prone to hallucinations, which can result in anatomically implausible segmentations. With manual correction impractical at scale, automated quality control (QC) techniques have to address the challenge. While promising, existing QC methods are organ-specific, limiting their generalizability and usability beyond their original intended task. To overcome this limitation, we propose no-new Quality Control (nnQC), a robust QC framework based on a diffusion-generative paradigm that self-adapts to any input organ dataset. Central to nnQC is a novel Team of Experts (ToE) architecture, where two specialized experts independently encode 3D spatial awareness, represented by the relative spatial position of an axial slice, and anatomical information derived from visual features from the original image. A weighted conditional module dynamically combines the pair of independent embeddings, or opinions to condition the sampling mechanism within a diffusion process, enabling the generation of a spatially aware pseudo-ground truth for predicting QC scores. Within its framework, nnQC integrates fingerprint adaptation to ensure adaptability across organs, datasets, and imaging modalities. We evaluated nnQC on seven organs using twelve publicly available datasets. Our results demonstrate that nnQC consistently outperforms state-of-the-art methods across all experiments, including cases where segmentation masks are highly degraded or completely missing, confirming its versatility and effectiveness across different organs.

Paper Structure

This paper contains 27 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The nnQC framework. For a 3D image–segmentation pair, dataset-specific fingerprints are extracted and used to preprocess it. Each axial segmentation slice and its corresponding 2D image are passed to the Team of Experts (ToE), which produces conditioning embeddings $c$ for the latent diffusion process. A VAE-GAN maps the 2D segmentation to be quality checked into a latent space of high-quality segmentations from which a DDIM-based Latent Diffusion Model (LDM) generates a pseudo-ground truth ($pGT$). A postprocessing restores the $pGT$ to its original space.
  • Figure 2: Two-stage training and inference workflow. At the first stage (top left), the VAE is trained adversarially (i.e., a VAE-GAN) to learn a rich latent space of high-quality segmentations (i.e., GTs). During the training's second stage (bottom left), the LDM learns to reconstruct noise conditioned by embeddings from the Team of Experts (ToE) module (top right). The ToE's positional embedding is jointly optimized with the LDM. At inference (bottom right), Gaussian noise and $S_d$ are fed into the LDM; the ToE-generated condition $c$ guides the LDM to recover $z_0$, which is decoded by VAE$_D$ to generate the pGT.
  • Figure 3: Pearson correlation (r) between the predicted pseudo-quality scores and real scores (DSC and HD95) across different organs, modalities, and datasets. HD95 is not estimated for Liu et al liu2019alarm as their model is designed to predict pseudo DSCs.
  • Figure 4: Mean Absolute Error (MAE) distribution across different organs, modalities, and datasets. The MAE is measured as the difference between the predicted pseudo-quality scores and real scores (DSC and HD95). As in Fig \ref{['tab:correlation']}, HD95 is not estimated for Liu et al liu2019alarm.
  • Figure 5: Learned normative manifolds and generated pGTs from a low-quality input segmentation from the ACDC dataset. The first column shows the GT and a low-quality segmentation overlaid in the original image. The following blocks display the latent spaces learned by different QC methods, the reconstructed pGTs (red box), and the reconstructed centroids (purple box). The projected manifolds are obtained using t-SNE maaten2008visualizing.
  • ...and 1 more figures