Table of Contents
Fetching ...

Conditional Diffusion Model with Anatomical-Dose Dual Constraints for End-to-End Multi-Tumor Dose Prediction

Hui Xie, Haiqin Hu, Lijuan Ding, Qing Li, Yue Sun, Tao Tan

TL;DR

The paper tackles radiotherapy dose prediction by introducing ADDiff-Dose, a conditional diffusion model that operates in a compressed latent space produced by LightweightVAE3D and is guided by multimodal inputs including tumor/OAR masks, beam parameters, and over 50 clinical dose-volume constraints. A composite loss and an organ-existence gating mechanism ensure both dosimetric accuracy and strict clinical compliance, while a two-stage training regime enables end-to-end multi-tumor prediction across head-and-neck and lung cases. Empirical results on a large public dataset and three external cohorts show state-of-the-art MAE (≈0.101 Gy), Dice (≈0.927), and tight constraint adherence (e.g., spinal cord D_{max} near 0.0005 Gy), with plan-generation times around 22 seconds. The work demonstrates strong generalization, robustness, and clinical relevance, offering a scalable, automated alternative to traditional trial-and-error planning and a foundation for future expansion to more sites and techniques.

Abstract

Radiotherapy treatment planning often relies on time-consuming, trial-and-error adjustments that heavily depend on the expertise of specialists, while existing deep learning methods face limitations in generalization, prediction accuracy, and clinical applicability. To tackle these challenges, we propose ADDiff-Dose, an Anatomical-Dose Dual Constraints Conditional Diffusion Model for end-to-end multi-tumor dose prediction. The model employs LightweightVAE3D to compress high-dimensional CT data and integrates multimodal inputs, including target and organ-at-risk (OAR) masks and beam parameters, within a progressive noise addition and denoising framework. It incorporates conditional features via a multi-head attention mechanism and utilizes a composite loss function combining MSE, conditional terms, and KL divergence to ensure both dosimetric accuracy and compliance with clinical constraints. Evaluation on a large-scale public dataset (2,877 cases) and three external institutional cohorts (450 cases in total) demonstrates that ADDiff-Dose significantly outperforms traditional baselines, achieving an MAE of 0.101-0.154 (compared to 0.316 for UNet and 0.169 for GAN models), a DICE coefficient of 0.927 (a 6.8% improvement), and limiting spinal cord maximum dose error to within 0.1 Gy. The average plan generation time per case is reduced to 22 seconds. Ablation studies confirm that the structural encoder enhances compliance with clinical dose constraints by 28.5%. To our knowledge, this is the first study to introduce a conditional diffusion model framework for radiotherapy dose prediction, offering a generalizable and efficient solution for automated treatment planning across diverse tumor sites, with the potential to substantially reduce planning time and improve clinical workflow efficiency.

Conditional Diffusion Model with Anatomical-Dose Dual Constraints for End-to-End Multi-Tumor Dose Prediction

TL;DR

The paper tackles radiotherapy dose prediction by introducing ADDiff-Dose, a conditional diffusion model that operates in a compressed latent space produced by LightweightVAE3D and is guided by multimodal inputs including tumor/OAR masks, beam parameters, and over 50 clinical dose-volume constraints. A composite loss and an organ-existence gating mechanism ensure both dosimetric accuracy and strict clinical compliance, while a two-stage training regime enables end-to-end multi-tumor prediction across head-and-neck and lung cases. Empirical results on a large public dataset and three external cohorts show state-of-the-art MAE (≈0.101 Gy), Dice (≈0.927), and tight constraint adherence (e.g., spinal cord D_{max} near 0.0005 Gy), with plan-generation times around 22 seconds. The work demonstrates strong generalization, robustness, and clinical relevance, offering a scalable, automated alternative to traditional trial-and-error planning and a foundation for future expansion to more sites and techniques.

Abstract

Radiotherapy treatment planning often relies on time-consuming, trial-and-error adjustments that heavily depend on the expertise of specialists, while existing deep learning methods face limitations in generalization, prediction accuracy, and clinical applicability. To tackle these challenges, we propose ADDiff-Dose, an Anatomical-Dose Dual Constraints Conditional Diffusion Model for end-to-end multi-tumor dose prediction. The model employs LightweightVAE3D to compress high-dimensional CT data and integrates multimodal inputs, including target and organ-at-risk (OAR) masks and beam parameters, within a progressive noise addition and denoising framework. It incorporates conditional features via a multi-head attention mechanism and utilizes a composite loss function combining MSE, conditional terms, and KL divergence to ensure both dosimetric accuracy and compliance with clinical constraints. Evaluation on a large-scale public dataset (2,877 cases) and three external institutional cohorts (450 cases in total) demonstrates that ADDiff-Dose significantly outperforms traditional baselines, achieving an MAE of 0.101-0.154 (compared to 0.316 for UNet and 0.169 for GAN models), a DICE coefficient of 0.927 (a 6.8% improvement), and limiting spinal cord maximum dose error to within 0.1 Gy. The average plan generation time per case is reduced to 22 seconds. Ablation studies confirm that the structural encoder enhances compliance with clinical dose constraints by 28.5%. To our knowledge, this is the first study to introduce a conditional diffusion model framework for radiotherapy dose prediction, offering a generalizable and efficient solution for automated treatment planning across diverse tumor sites, with the potential to substantially reduce planning time and improve clinical workflow efficiency.

Paper Structure

This paper contains 17 sections, 9 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overall Framework of the Anatomical-Dose Dual Constraints Conditional Diffusion Model (ADDiff-Dose). The schematic illustrates the end-to-end architecture for multi-tumor radiotherapy dose prediction. The diagram illustrates the conditional diffusion process for dose prediction. The top path shows the forward diffusion process, where the initial latent representation $Z_o$ is progressively noised over T steps to become $Z_T$. The bottom path shows the reverse denoising process, where a U-Net iteratively refines$Z_T$ back to $Z_o$, conditioned on multi-modal inputs including clinical priors, PTV/OAR masks, class labels, positional information, and beam parameters. The final predicted dose is obtained by decoding the denoised $Z_o$.
  • Figure 2: Architecture Diagram of the Variational Autoencoder (VAE). The upper encoding process uses layers like Conv3d, BatchNorm3d, StableSiLU, and AdaptiveAvgPool3d to generate $\boldsymbol{\mu}$ and Log(var) for sampling the latent representation. The lower decoding process leverages ConvTranspose3d, StableSiLU, and Sigmoid layers to reconstruct output from the latent space, illustrating the full VAE workflow.
  • Figure 3: Architecture Diagram of the U-Net Model with Modified Attention and ResBlock Components. It shows the encoding (En1-En4) and decoding (De1-De4) paths, incorporating elements like Cascade - GroupNorm - StableSiLU, ResBlock, and MultiHeadAttention, along with detailed internal structures of ResBlock and MultiHeadAttention at the bottom.
  • Figure 4: Visualization and Comparison of Dose Prediction Results and Differences of Different Models for Lung Tumors. Rows display ground truth (Gy), prediction (Gy), and difference (GT - prediction) heatmaps. Columns compare models (UNet, DeepLabV3, GAN, DoseNet, DoseDiff, MD-Dose, Baseline, Proposed), with color bars indicating dose value scales, enabling assessment of prediction accuracy and model performance differences.
  • Figure 5: Visualization and Comparison of Dose Prediction Results and Differences of Different Models for Head and Neck Tumors. The figure presents three rows of heatmaps: the top row shows ground truth dose distributions (Gy), the middle row displays dose predictions from models (UNet, DeepLabV3, GAN, DoseNet, DoseDiff, MD-Dose, Baseline, Proposed), and the bottom row illustrates the difference (ground truth - prediction). Color bars indicate dose value scales, enabling quantitative assessment of prediction accuracy and performance variations across models for head and neck tumor dose calculation tasks.
  • ...and 2 more figures