Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy
Md Mainul Abrar, Parvat Sapkota, Damon Sprouts, Xun Jia, Yujie Chi
TL;DR
This work develops an ACER-based virtual treatment planner that learns to tune 9 treatment-planning parameters from DVH-derived states to optimize prostate IMRT plans within an in-house TPS. Despite training on a single patient case, the ACER-VTP generalizes well across three independent datasets and achieves an average final plan score of $8.93\pm0.27$, with $93.09\%$ of cases reaching the perfect score of 9, outperforming a DQN-based baseline. The approach uses a two-head neural network (policy and Q-value) with asynchronous online learners and experience replay, employing retrace-based targets and optional entropy/TRPO stabilization, and demonstrates robustness to FGSM adversarial perturbations. These results indicate potential for real-time, scalable, and robust automatic treatment planning across institutions, with future work exploring continuous-action tuning and stronger adversarial testing.
Abstract
Background: Real-time treatment planning in IMRT is challenging due to complex beam interactions. AI has improved automation, but existing models require large, high-quality datasets and lack universal applicability. Deep reinforcement learning (DRL) offers a promising alternative by mimicking human trial-and-error planning. Purpose: Develop a stochastic policy-based DRL agent for automatic treatment planning with efficient training, broad applicability, and robustness against adversarial attacks using Fast Gradient Sign Method (FGSM). Methods: Using the Actor-Critic with Experience Replay (ACER) architecture, the agent tunes treatment planning parameters (TPPs) in inverse planning. Training is based on prostate cancer IMRT cases, using dose-volume histograms (DVHs) as input. The model is trained on a single patient case, validated on two independent cases, and tested on 300+ plans across three datasets. Plan quality is assessed using ProKnow scores, and robustness is tested against adversarial attacks. Results: Despite training on a single case, the model generalizes well. Before ACER-based planning, the mean plan score was 6.20$\pm$1.84; after, 93.09% of cases achieved a perfect score of 9, with a mean of 8.93$\pm$0.27. The agent effectively prioritizes optimal TPP tuning and remains robust against adversarial attacks. Conclusions: The ACER-based DRL agent enables efficient, high-quality treatment planning in prostate cancer IMRT, demonstrating strong generalizability and robustness.
