Table of Contents
Fetching ...

Clinical DVH metrics as a loss function for 3D dose prediction in head and neck radiotherapy

Ruochen Gao, Marius Staring, Frank Dankers

Abstract

Purpose: Deep-learning-based three-dimensional (3D) dose prediction is widely used in automated radiotherapy workflows. However, most existing models are trained with voxel-wise regression losses, which are poorly aligned with clinical plan evaluation criteria based on dose-volume histogram (DVH) metrics. This study aims to develop a clinically guided loss formulation that directly optimizes clinically used DVH metrics while remaining computationally efficient for head and neck (H\&N) dose prediction. Methods: We propose a clinical DVH metric loss (CDM loss) that incorporates differentiable \textit{D-metrics} and surrogate \textit{V-metrics}, together with a lossless bit-mask region-of-interest (ROI) encoding to improve training efficiency. The method was evaluated on 174 H\&N patients using a temporal split (137 training, 37 testing). Results: Compared with MAE- and DVH-curve based losses, CDM loss substantially improved target coverage and satisfied all clinical constraints. Using a standard 3D U-Net, the PTV Score was reduced from 1.544 (MAE) to 0.491 (MAE + CDM), while OAR sparing remained comparable. Bit-mask encoding reduced training time by 83\% and lowered GPU memory usage. Conclusion: Directly optimizing clinically used DVH metrics enables 3D dose predictions that are better aligned with clinical treatment planning criteria than conventional voxel-wise or DVH-curve-based supervision. The proposed CDM loss, combined with efficient ROI bit-mask encoding, provides a practical and scalable framework for H\&N dose prediction.

Clinical DVH metrics as a loss function for 3D dose prediction in head and neck radiotherapy

Abstract

Purpose: Deep-learning-based three-dimensional (3D) dose prediction is widely used in automated radiotherapy workflows. However, most existing models are trained with voxel-wise regression losses, which are poorly aligned with clinical plan evaluation criteria based on dose-volume histogram (DVH) metrics. This study aims to develop a clinically guided loss formulation that directly optimizes clinically used DVH metrics while remaining computationally efficient for head and neck (H\&N) dose prediction. Methods: We propose a clinical DVH metric loss (CDM loss) that incorporates differentiable \textit{D-metrics} and surrogate \textit{V-metrics}, together with a lossless bit-mask region-of-interest (ROI) encoding to improve training efficiency. The method was evaluated on 174 H\&N patients using a temporal split (137 training, 37 testing). Results: Compared with MAE- and DVH-curve based losses, CDM loss substantially improved target coverage and satisfied all clinical constraints. Using a standard 3D U-Net, the PTV Score was reduced from 1.544 (MAE) to 0.491 (MAE + CDM), while OAR sparing remained comparable. Bit-mask encoding reduced training time by 83\% and lowered GPU memory usage. Conclusion: Directly optimizing clinically used DVH metrics enables 3D dose predictions that are better aligned with clinical treatment planning criteria than conventional voxel-wise or DVH-curve-based supervision. The proposed CDM loss, combined with efficient ROI bit-mask encoding, provides a practical and scalable framework for H\&N dose prediction.

Paper Structure

This paper contains 18 sections, 24 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Workflow of a typical deep-learning-based H&N dose prediction method. The planning CT and masks of the PTVs and OARs are provided as inputs to a U-shaped neural network to generate a 3D voxel-wise dose distribution. In H&N cases, the PTVs are typically assigned two or three dose levels, while the number of OARs is relatively large compared to other treatment sites.
  • Figure 2: (a) The 3D dose distribution within a ROI is flattened into a one-dimensional dose array $\{d_i\}_{i=1}^{N}$. After a descending sorting operation, we get $d^{\downarrow}_{1} \ge d^{\downarrow}_{2} \ge \dots \ge d^{\downarrow}_{N}$. A D--metric, such as $D_{x\%}$, is obtained by retrieving the dose value $d^{\downarrow}_k$ at the index $k$ that corresponds to the top $x\%$ portion of the ROI volume in the sorted dose array. (b) Using the same flattened dose array $\{d_i\}_{i=1}^{N}$, a V--metric such as $V_{x\%}$ is computed by applying a dose threshold equal to $x\%$ of the prescription dose, summing the number of voxels that exceed this threshold, and computing the corresponding volume percentage. For example, $V_{95\%}$ of $\mathrm{PTV}_{70}$ denotes the fraction of the $\mathrm{PTV}_{70}$ volume receiving at least $95\%$ of the 70 Gy dose level.
  • Figure 3: Multiple individual ROI masks are losslessly encoded into a single integer bit-mask, where each bit position represents a specific ROI (illustrated in pink). Because the raw bit-mask values span a wide numerical range, direct visualization is not intuitive. Therefore, the bit-mask is visualized in pseudo-color, with colors assigned solely based on their unique values for display purposes. During decoding, the binary mask for ROI$_i$ is obtained by testing whether the $(i\!-\!1)$-th bit of the bit-mask is active, implemented as a bitwise AND operation on GPU between the bit-mask and $(1 \ll (i-1))$, where the left-shifted single-bit mask $(1 \ll (i-1))$ is illustrated in yellow. A nonzero result indicates voxel membership in ROI$_i$.
  • Figure 4: Boxplots of dose evaluation metrics for the clinically most relevant ROIs for different loss function configurations. (a) Boxplots for PTVs. (b) Boxplots for OARs. The corresponding planning aims and dose constraints are also shown (aims are clinically regularly violated due to proximity of the tumor). Parotid and submandibular glands were categorized as ipsilateral (Ipsi) or contralateral (Contra) according to received dose. Statistical significance was tested using a two-tailed Wilcoxon signed-rank test. *: $p \leq$ 0.05, ns = not significant.
  • Figure 5: Axial visualization of dose distributions under different loss function configurations. The red contour denotes the $\text{PTV}_{54.25}$, while the blue contour represents the $\text{PTV}_{70}$. A colormap routinely used in clinical practice is employed to enhance visual interpretability.