Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy
Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen
TL;DR
The paper addresses leaf sequencing in radiotherapy planning by introducing Reinforced Leaf Sequencer (RLS), a deep multi-agent reinforcement learning model that operates under a finite horizon with a two-level action space to predict leaf positions and monitor units. It adapts a two-level PPO framework with leaf and MU policies and a critic, and couples a cropping strategy and a post-processing rule to handle heterogeneous fluence patterns and varying sector lengths. Across four datasets and multiple planning contexts, RLS achieves lower fluence reconstruction error (MNSE) and demonstrates faster convergence than a leading optimization sequencer, with competitive 3D dose and DVH scores in OpenKBP-like evaluations and viability in an end-to-end AI VMAT pipeline. The work highlights the potential of MARL to accelerate RTP planning and enable end-to-end integration with other AI modules, while acknowledging limitations related to data availability, training cost, and evaluation scope, and outlining future directions toward full end-to-end training and broader clinical validation.
Abstract
In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale training and can control movement patterns through the design of reward mechanisms. We have conducted experiments on four datasets with four metrics and compared our model with a leading optimization sequencer. Our findings reveal that the proposed RLS model can achieve reduced fluence reconstruction errors, and potential faster convergence when integrated in an optimization planner. Additionally, RLS has shown promising results in a full artificial intelligence RTP pipeline. We hope this pioneer multi-agent RL leaf sequencer can foster future research on machine learning for RTP.
