Table of Contents
Fetching ...

Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy

Md Mainul Abrar, Parvat Sapkota, Damon Sprouts, Xun Jia, Yujie Chi

TL;DR

This work develops an ACER-based virtual treatment planner that learns to tune 9 treatment-planning parameters from DVH-derived states to optimize prostate IMRT plans within an in-house TPS. Despite training on a single patient case, the ACER-VTP generalizes well across three independent datasets and achieves an average final plan score of $8.93\pm0.27$, with $93.09\%$ of cases reaching the perfect score of 9, outperforming a DQN-based baseline. The approach uses a two-head neural network (policy and Q-value) with asynchronous online learners and experience replay, employing retrace-based targets and optional entropy/TRPO stabilization, and demonstrates robustness to FGSM adversarial perturbations. These results indicate potential for real-time, scalable, and robust automatic treatment planning across institutions, with future work exploring continuous-action tuning and stronger adversarial testing.

Abstract

Background: Real-time treatment planning in IMRT is challenging due to complex beam interactions. AI has improved automation, but existing models require large, high-quality datasets and lack universal applicability. Deep reinforcement learning (DRL) offers a promising alternative by mimicking human trial-and-error planning. Purpose: Develop a stochastic policy-based DRL agent for automatic treatment planning with efficient training, broad applicability, and robustness against adversarial attacks using Fast Gradient Sign Method (FGSM). Methods: Using the Actor-Critic with Experience Replay (ACER) architecture, the agent tunes treatment planning parameters (TPPs) in inverse planning. Training is based on prostate cancer IMRT cases, using dose-volume histograms (DVHs) as input. The model is trained on a single patient case, validated on two independent cases, and tested on 300+ plans across three datasets. Plan quality is assessed using ProKnow scores, and robustness is tested against adversarial attacks. Results: Despite training on a single case, the model generalizes well. Before ACER-based planning, the mean plan score was 6.20$\pm$1.84; after, 93.09% of cases achieved a perfect score of 9, with a mean of 8.93$\pm$0.27. The agent effectively prioritizes optimal TPP tuning and remains robust against adversarial attacks. Conclusions: The ACER-based DRL agent enables efficient, high-quality treatment planning in prostate cancer IMRT, demonstrating strong generalizability and robustness.

Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy

TL;DR

This work develops an ACER-based virtual treatment planner that learns to tune 9 treatment-planning parameters from DVH-derived states to optimize prostate IMRT plans within an in-house TPS. Despite training on a single patient case, the ACER-VTP generalizes well across three independent datasets and achieves an average final plan score of , with of cases reaching the perfect score of 9, outperforming a DQN-based baseline. The approach uses a two-head neural network (policy and Q-value) with asynchronous online learners and experience replay, employing retrace-based targets and optional entropy/TRPO stabilization, and demonstrates robustness to FGSM adversarial perturbations. These results indicate potential for real-time, scalable, and robust automatic treatment planning across institutions, with future work exploring continuous-action tuning and stronger adversarial testing.

Abstract

Background: Real-time treatment planning in IMRT is challenging due to complex beam interactions. AI has improved automation, but existing models require large, high-quality datasets and lack universal applicability. Deep reinforcement learning (DRL) offers a promising alternative by mimicking human trial-and-error planning. Purpose: Develop a stochastic policy-based DRL agent for automatic treatment planning with efficient training, broad applicability, and robustness against adversarial attacks using Fast Gradient Sign Method (FGSM). Methods: Using the Actor-Critic with Experience Replay (ACER) architecture, the agent tunes treatment planning parameters (TPPs) in inverse planning. Training is based on prostate cancer IMRT cases, using dose-volume histograms (DVHs) as input. The model is trained on a single patient case, validated on two independent cases, and tested on 300+ plans across three datasets. Plan quality is assessed using ProKnow scores, and robustness is tested against adversarial attacks. Results: Despite training on a single case, the model generalizes well. Before ACER-based planning, the mean plan score was 6.201.84; after, 93.09% of cases achieved a perfect score of 9, with a mean of 8.930.27. The agent effectively prioritizes optimal TPP tuning and remains robust against adversarial attacks. Conclusions: The ACER-based DRL agent enables efficient, high-quality treatment planning in prostate cancer IMRT, demonstrating strong generalizability and robustness.

Paper Structure

This paper contains 15 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: The workflow of the actor critic with experience replay (ACER)-based automatic treatment planning process. Here, 'TPP' represents treatment planning parameter.
  • Figure 2: The deep neural network featuring a 'two-head' structure designed for the actor critic with experience replay (ACER)-based virtual treatment planner (VTP). It takes the dose volume histogram (DVH) of the current plan as input. It outputs a treatment planning parameter (TPP) tuning strategy across 18 actions in one head, and produces the corresponding Q-value in the other head.
  • Figure 3: The convergence map of the actor critic with experience replay (ACER)-based virtual treatment planner (VTP) training process, evaluated based on the average plan score for the validation patient cases.
  • Figure 4: (a)-(b) The plan score distributions for 49 test cases generated under trivial treatment planning parameter (TPP) settings and 147 test cases generated under random TPP settings from 49 patient cases in dataset 1, respectively, before and after actor critic with experience replay (ACER)-guided treatment planning. The histogram width is set to 1. (c)-(d) The mean and standard deviation of the plan score distributions before and after ACER-based treatment planning for the cases shown in (a) and (b), respectively. The groups in (c) and (d) correspond one-to-one with the histogram distributions in (a) and (b). ACER-based treatment planning significantly improves plan quality, achieving a mean score close to 9 across all plan groups.
  • Figure 5: The plan score distributions before and after actor critic with experience replay (ACER)-guided treatment planning for (a) 30 test cases generated from a single patient case in dataset 2, and (b) 90 test cases generated from 30 patient cases in dataset 3 under random treatment planning parameter (TPP) initializations.
  • ...and 3 more figures