Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy

Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen

TL;DR

The paper addresses leaf sequencing in radiotherapy planning by introducing Reinforced Leaf Sequencer (RLS), a deep multi-agent reinforcement learning model that operates under a finite horizon with a two-level action space to predict leaf positions and monitor units. It adapts a two-level PPO framework with leaf and MU policies and a critic, and couples a cropping strategy and a post-processing rule to handle heterogeneous fluence patterns and varying sector lengths. Across four datasets and multiple planning contexts, RLS achieves lower fluence reconstruction error (MNSE) and demonstrates faster convergence than a leading optimization sequencer, with competitive 3D dose and DVH scores in OpenKBP-like evaluations and viability in an end-to-end AI VMAT pipeline. The work highlights the potential of MARL to accelerate RTP planning and enable end-to-end integration with other AI modules, while acknowledging limitations related to data availability, training cost, and evaluation scope, and outlining future directions toward full end-to-end training and broader clinical validation.

Abstract

In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale training and can control movement patterns through the design of reward mechanisms. We have conducted experiments on four datasets with four metrics and compared our model with a leading optimization sequencer. Our findings reveal that the proposed RLS model can achieve reduced fluence reconstruction errors, and potential faster convergence when integrated in an optimization planner. Additionally, RLS has shown promising results in a full artificial intelligence RTP pipeline. We hope this pioneer multi-agent RL leaf sequencer can foster future research on machine learning for RTP.

Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy

TL;DR

The paper addresses leaf sequencing in radiotherapy planning by introducing Reinforced Leaf Sequencer (RLS), a deep multi-agent reinforcement learning model that operates under a finite horizon with a two-level action space to predict leaf positions and monitor units. It adapts a two-level PPO framework with leaf and MU policies and a critic, and couples a cropping strategy and a post-processing rule to handle heterogeneous fluence patterns and varying sector lengths. Across four datasets and multiple planning contexts, RLS achieves lower fluence reconstruction error (MNSE) and demonstrates faster convergence than a leading optimization sequencer, with competitive 3D dose and DVH scores in OpenKBP-like evaluations and viability in an end-to-end AI VMAT pipeline. The work highlights the potential of MARL to accelerate RTP planning and enable end-to-end integration with other AI modules, while acknowledging limitations related to data availability, training cost, and evaluation scope, and outlining future directions toward full end-to-end training and broader clinical validation.

Abstract

In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale training and can control movement patterns through the design of reward mechanisms. We have conducted experiments on four datasets with four metrics and compared our model with a leading optimization sequencer. Our findings reveal that the proposed RLS model can achieve reduced fluence reconstruction errors, and potential faster convergence when integrated in an optimization planner. Additionally, RLS has shown promising results in a full artificial intelligence RTP pipeline. We hope this pioneer multi-agent RL leaf sequencer can foster future research on machine learning for RTP.
Paper Structure (30 sections, 7 equations, 16 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 7 equations, 16 figures, 8 tables, 1 algorithm.

Figures (16)

  • Figure 1: Illustration of a typical RTP process. Three common components are shown in the orange boxes. We focus on leaf sequencing in this work. The term "optimization" in this paper refers to a series of methods that are not machine learning.
  • Figure 2: Control point distribution: Each black dot signifies a control point. (a) In VMAT, control points can be approximate-evenly spaced along the designated arc during therapy, and may be divided into multiple sectors during planning and optimization. (b) The IMRT plan is comprised of several fields, each encompassing multiple control points at identical gantry angles.
  • Figure 3: (a) shows a 2D illustration of multi-leaf pairs, with the middle depicting PTV projection. (b) provides a 3D view of a leaf pair and its connection to cumulated fluences. (c) illustrates motivations of Reward 1 (green) and Reward 2 (red) by comparing cumulated and target fluences. Details are in Appendix \ref{['background']}.
  • Figure 4: Using finite horizon RL for accelerated inference. Conventional optimization methods start with an estimate of the leaf/MU positions and iteratively refine the estimate until converge or the stopping criteria are met. In principle, we can also apply vanilla RL in an infinite horizon context, and iteratively refine the estimates. However, to achieve greater efficiency during inference, we train RLS to execute only once for each CP.
  • Figure 5: Illustration of the proposed RLS. The upper shows the methodology and the lower shows the input/output of RLS. The target fluence is splitted into $X$ rows, each row is related to one leaf-pair and one leaf actor. $x$-th leaf actor predicts the positions of Leaf $A_x$ and $B_x$. All rows in $k$-th control point (CP) shares the same monitor unit, which is predicted by MU actor after all leaf positions are obtained. The state of leaf actor at $CP_{k+1}$ includes target fluence, cumulated fluence of $CP_{1} \sim CP_{k}$, leaf positions of $CP_{k}$. The state of MU actor is similar but replace leaf positions of $CP_{k}$ with that of $CP_{k+1}$.
  • ...and 11 more figures