Table of Contents
Fetching ...

See More, Change Less: Anatomy-Aware Diffusion for Contrast Enhancement

Junqi Liu, Zejun Wu, Pedro R. A. S. Bassi, Xinze Zhou, Wenxuan Li, Ibrahim E. Hamamci, Sezgin Er, Tianyu Lin, Yi Luo, Szymon Płotka, Bjoern Menze, Daguang Xu, Kai Ding, Kang Wang, Yang Yang, Yucheng Tang, Alan L. Yuille, Zongwei Zhou

TL;DR

SMILE tackles the risk of anatomically inaccurate enhancements in medical CT by introducing an anatomy-aware diffusion framework trained with structure, phase, and intensity supervision. It operates without voxel-level registration, leveraging a large, richly annotated CTVerse dataset to ensure anatomical fidelity and physiologically correct contrast dynamics. Across six external datasets, SMILE delivers consistent improvements in image quality (SSIM/PSNR/FID) and preserves organ integrity (HU/size correlations >0.95), while also boosting non-contrast tumor detection. The approach promises clinically reliable contrast enhancement that can aid screening and decision-making without extra scans or contrast exposure.

Abstract

Image enhancement improves visual quality and helps reveal details that are hard to see in the original image. In medical imaging, it can support clinical decision-making, but current models often over-edit. This can distort organs, create false findings, and miss small tumors because these models do not understand anatomy or contrast dynamics. We propose SMILE, an anatomy-aware diffusion model that learns how organs are shaped and how they take up contrast. It enhances only clinically relevant regions while leaving all other areas unchanged. SMILE introduces three key ideas: (1) structure-aware supervision that follows true organ boundaries and contrast patterns; (2) registration-free learning that works directly with unaligned multi-phase CT scans; (3) unified inference that provides fast and consistent enhancement across all contrast phases. Across six external datasets, SMILE outperforms existing methods in image quality (14.2% higher SSIM, 20.6% higher PSNR, 50% better FID) and in clinical usefulness by producing anatomically accurate and diagnostically meaningful images. SMILE also improves cancer detection from non-contrast CT, raising the F1 score by up to 10 percent.

See More, Change Less: Anatomy-Aware Diffusion for Contrast Enhancement

TL;DR

SMILE tackles the risk of anatomically inaccurate enhancements in medical CT by introducing an anatomy-aware diffusion framework trained with structure, phase, and intensity supervision. It operates without voxel-level registration, leveraging a large, richly annotated CTVerse dataset to ensure anatomical fidelity and physiologically correct contrast dynamics. Across six external datasets, SMILE delivers consistent improvements in image quality (SSIM/PSNR/FID) and preserves organ integrity (HU/size correlations >0.95), while also boosting non-contrast tumor detection. The approach promises clinically reliable contrast enhancement that can aid screening and decision-making without extra scans or contrast exposure.

Abstract

Image enhancement improves visual quality and helps reveal details that are hard to see in the original image. In medical imaging, it can support clinical decision-making, but current models often over-edit. This can distort organs, create false findings, and miss small tumors because these models do not understand anatomy or contrast dynamics. We propose SMILE, an anatomy-aware diffusion model that learns how organs are shaped and how they take up contrast. It enhances only clinically relevant regions while leaving all other areas unchanged. SMILE introduces three key ideas: (1) structure-aware supervision that follows true organ boundaries and contrast patterns; (2) registration-free learning that works directly with unaligned multi-phase CT scans; (3) unified inference that provides fast and consistent enhancement across all contrast phases. Across six external datasets, SMILE outperforms existing methods in image quality (14.2% higher SSIM, 20.6% higher PSNR, 50% better FID) and in clinical usefulness by producing anatomically accurate and diagnostically meaningful images. SMILE also improves cancer detection from non-contrast CT, raising the F1 score by up to 10 percent.

Paper Structure

This paper contains 32 sections, 7 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: A. Natural image enhancement. When generative models add makeup to a photo, they often do too much---changing not just the face, but the hair and clothes as well. That’s fine for social media filters, but in medical imaging, such 'over-creativity' can hide real tumors or create fake ones. B. Medical image enhancement. Doctors use contrast agents (a liquid injected into the body) to make internal organs easier to see. After injection, CT scans are taken at four times: (1) Non-contrast (N) is the baseline before enhancement; (2) Arterial (A) highlights arteries; (3) Venous (V) enhances organs such as the liver and spleen; and (4) Delay (D) shows mainly urinary system. The contrast makes certain tissues absorb more X-rays and appear brighter, revealing subtle tumors or vascular structures that would otherwise be invisible. These phases reflect how contrast flows through organs, helping radiologists detect and diagnose disease more accurately.
  • Figure 2: SMILE employs anatomy-aware supervision. Our framework integrates multiple anatomical constraints to guide contrast-phase enhancement. The structural segmentation loss ($\mathcal{L}_{\text{seg}}$) and cycle consistency loss ($\mathcal{L}_{\text{cyc}}$) preserve structural fidelity. The phase classification loss ($\mathcal{L}_{\text{cls}}$) ensures the enhanced CT shows the correct contrast-phase characteristics. Finally, the intensity HU loss $\mathcal{L}_{\text{HU}}$ and air/bone loss $\mathcal{L}_{\text{AB}}$ enforce realistic organ (mainly abdominal, e.g., liver) enhancement and maintain consistency in air and bone regions that should remain unchanged. As demonstrated in the figure, SMILE does not require registration for enhancement source and ground truth.
  • Figure 3: CTVerse sets a new standard for multi-phase CT benchmarks by offering the most comprehensive, finely annotated organ and tumor labels. We introduce CTVerse, a high-quality, multi-phase, and precisely annotated CT dataset designed for training and evaluating generative models. A. CTVerse provides detailed annotations for multiple tumor types (pancreas, liver, and kidney). B. CTVerse contains voxel-wise annotations for 88 anatomical structures, including organs at risk, vessels, bones, etc. C. Compared to existing publicly available datasets li2024abdomenatlasli2025pantsbassi2025radgptqu2023annotatingli2024wellli2024medshapenetchou2024embracingliu2024universalliu2023clipchen2025vision, our CTVerse includes at least 1.5 times more labeled structures, making it a strong benchmark for generative models and medical research.
  • Figure 4: SMILE ensures both structural consistency and intensity accuracy on enhanced scans, outperforming all competing models in both anatomical integrity and tumor visibility. We evaluate whether the enhanced CT scans remain anatomically correct (no extra structures, no distortions, high image quality), and whether the added contrast is diagnostically correct by allowing the tumor to be clearly detected. Baselines used (labeled 1–9): (1) Pix2Pix isola2017image, (2) CycleGAN chu2017cyclegan, (3) CyTran ristea2023cytran, (4) DALL-E esser2021taming, (5) MedDiffusion khader2023denoising, (6) CUT park2020contrastive, (7) ChatGPT-5.1 chatgpt51, (8) Google Nano Banana google_nanobanana, (9) Qwen-3 Max qwen3max. The last three large vision models are included only for visual reference to show that general image-editing systems cannot handle medical CT enhancement reliably. A. Structural Consistency. We evaluate SMILE's structural fidelity by comparing non-contrast CT enhancement results against 9 compelling baseline models. Compared to other methods, SMILE preserves organ boundaries and global anatomy without introducing extra structures or phase-inconsistent artifacts. This shows that SMILE maintains high structural reliability even under large intensity shifts. B. Intensity Accuracy. To further validate enhancement correctness, we apply a state-of-the-art tumor detector li2025scalemai on the enhanced arterial, venous, and delay phases. In all three enhanced phases, the tumor is successfully detected (marked by the black circle), confirming that SMILE restores clinically meaningful contrast cues needed for downstream diagnostic tasks.
  • Figure 5: Across all phases and organs, SMILE achieves an average HU and size correlation above 0.95, demonstrating strong anatomical and intensity consistency with real CT scans. We evaluate how well SMILE preserves organ intensity (HU) and volume size compared with the real CT scans across 22 organs. Here the HU is averaged over the whole organ rather than per-pixel. Each large plot corresponds to a target phase, N: non-contrast, A: arterial, V: venous, D: delay, and the four small plots on the right show results as enhanced from other source phases. The smaller plots illustrate all source→target phase pairs, showing that SMILE delivers consistently strong organ HU and size alignment regardless of the enhancement direction. Overall, SMILE maintains high consistency in both HU and organ size, demonstrating stable and reliable enhancement performance.
  • ...and 11 more figures