Table of Contents
Fetching ...

Latent Spaces Enable Transformer-Based Dose Prediction in Complex Radiotherapy Plans

Edward Wang, Ryan Au, Pencilla Lang, Sarah A. Mattonen

TL;DR

This work introduces LDFormer, a two‑stage latent transformer for real‑time dose prediction in multi‑lesion lung SABR plans. By encoding anatomy and dose into discrete latent spaces with VQVAE and then predicting the dose latent via a decoder‑only transformer that handles variable lesion counts with modified causal masking, LDFormer achieves fast 3‑D dose predictions on consumer hardware. LDFormer outperforms a state‑of‑the‑art GAN on PTV conformality, particularly for overlapping lesions, and offers substantial potential to reduce planning time and resource burden in clinical workflows. The approach enables rapid, decision‑support for radiation oncologists and may accelerate adoption of multi‑lesion SABR planning while highlighting areas for future data‑scale improvement and prospective validation.

Abstract

Evidence is accumulating in favour of using stereotactic ablative body radiotherapy (SABR) to treat multiple cancer lesions in the lung. Multi-lesion lung SABR plans are complex and require significant resources to create. In this work, we propose a novel two-stage latent transformer framework (LDFormer) for dose prediction of lung SABR plans with varying numbers of lesions. In the first stage, patient anatomical information and the dose distribution are encoded into a latent space. In the second stage, a transformer learns to predict the dose latent from the anatomical latents. Causal attention is modified to adapt to different numbers of lesions. LDFormer outperforms a state-of-the-art generative adversarial network on dose conformality in and around lesions, and the performance gap widens when considering overlapping lesions. LDFormer generates predictions of 3-D dose distributions in under 30s on consumer hardware, and has the potential to assist physicians with clinical decision making, reduce resource costs, and accelerate treatment planning.

Latent Spaces Enable Transformer-Based Dose Prediction in Complex Radiotherapy Plans

TL;DR

This work introduces LDFormer, a two‑stage latent transformer for real‑time dose prediction in multi‑lesion lung SABR plans. By encoding anatomy and dose into discrete latent spaces with VQVAE and then predicting the dose latent via a decoder‑only transformer that handles variable lesion counts with modified causal masking, LDFormer achieves fast 3‑D dose predictions on consumer hardware. LDFormer outperforms a state‑of‑the‑art GAN on PTV conformality, particularly for overlapping lesions, and offers substantial potential to reduce planning time and resource burden in clinical workflows. The approach enables rapid, decision‑support for radiation oncologists and may accelerate adoption of multi‑lesion SABR planning while highlighting areas for future data‑scale improvement and prospective validation.

Abstract

Evidence is accumulating in favour of using stereotactic ablative body radiotherapy (SABR) to treat multiple cancer lesions in the lung. Multi-lesion lung SABR plans are complex and require significant resources to create. In this work, we propose a novel two-stage latent transformer framework (LDFormer) for dose prediction of lung SABR plans with varying numbers of lesions. In the first stage, patient anatomical information and the dose distribution are encoded into a latent space. In the second stage, a transformer learns to predict the dose latent from the anatomical latents. Causal attention is modified to adapt to different numbers of lesions. LDFormer outperforms a state-of-the-art generative adversarial network on dose conformality in and around lesions, and the performance gap widens when considering overlapping lesions. LDFormer generates predictions of 3-D dose distributions in under 30s on consumer hardware, and has the potential to assist physicians with clinical decision making, reduce resource costs, and accelerate treatment planning.
Paper Structure (12 sections, 2 equations, 3 figures, 5 tables)

This paper contains 12 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The overall workflow is shown. A: Vector-quantized variational autoencoders (VQVAEs) are trained to encode organs at risk (OARs), planning target volumes (PTVs), and dose into latent representations (LRs). B: The transformer is trained to predict the dose LR from LRs of the OARs, initial dose estimate and PTVs concatenated with the slice index and prescription. The dose LR is then decoded into a dose distribution. For simplicity, LRs are depicted as 2x2, and only one PTV is shown.
  • Figure 2: Axial, coronal, and sagittal views of the (A) ground truth, (B) LDFormer and (C) GAN doses are shown for two testing set patients with overlapping lesions. Arrows indicate hotspots in overlapping lesions. The unit of the colourbar is EQD2 Gy ($\frac{\alpha}{\beta}=3$).
  • Figure S1: The architecture of the 2-D vector-quantized variational autoencoders (VQVAE) is shown. Arrows pointing to the right are convolution layers, and arrows pointing up are transpose convolution layers. The codebook is a 2-D matrix of embedding vectors. The 3-D VQVAE has 6 downsampling convolutions instead of 4, with the properties of the last downsampling convolution provided in Table S1.