Table of Contents
Fetching ...

Transforming Multimodal Models into Action Models for Radiotherapy

Matteo Ferrante, Alessandra Carosi, Rolando Maria D Angelillo, Nicola Toschi

TL;DR

This work tackles the inefficiency and variability of radiotherapy treatment planning by converting a large multimodal foundation model into an action model through few‑shot reinforcement learning, guided by a Monte Carlo evaluator. By embedding physics, radiation, and anatomy priors into the planning loop, the Text-to-Plan approach leverages a pre-trained MLM to iteratively optimize gantry angles and dose distributions, achieving higher rewards than traditional RL and random baselines on prostate data. The results show improved dose conformity to the target and better sparing of organs at risk, suggesting potential for faster, more standardized clinical TP. However, limitations of current LLMs in medical reliability and 2D vision representations indicate the need for future 3D-aware medical foundation models and adapters before clinical deployment.

Abstract

Radiotherapy is a crucial cancer treatment that demands precise planning to balance tumor eradication and preservation of healthy tissue. Traditional treatment planning (TP) is iterative, time-consuming, and reliant on human expertise, which can potentially introduce variability and inefficiency. We propose a novel framework to transform a large multimodal foundation model (MLM) into an action model for TP using a few-shot reinforcement learning (RL) approach. Our method leverages the MLM's extensive pre-existing knowledge of physics, radiation, and anatomy, enhancing it through a few-shot learning process. This allows the model to iteratively improve treatment plans using a Monte Carlo simulator. Our results demonstrate that this method outperforms conventional RL-based approaches in both quality and efficiency, achieving higher reward scores and more optimal dose distributions in simulations on prostate cancer data. This proof-of-concept suggests a promising direction for integrating advanced AI models into clinical workflows, potentially enhancing the speed, quality, and standardization of radiotherapy treatment planning.

Transforming Multimodal Models into Action Models for Radiotherapy

TL;DR

This work tackles the inefficiency and variability of radiotherapy treatment planning by converting a large multimodal foundation model into an action model through few‑shot reinforcement learning, guided by a Monte Carlo evaluator. By embedding physics, radiation, and anatomy priors into the planning loop, the Text-to-Plan approach leverages a pre-trained MLM to iteratively optimize gantry angles and dose distributions, achieving higher rewards than traditional RL and random baselines on prostate data. The results show improved dose conformity to the target and better sparing of organs at risk, suggesting potential for faster, more standardized clinical TP. However, limitations of current LLMs in medical reliability and 2D vision representations indicate the need for future 3D-aware medical foundation models and adapters before clinical deployment.

Abstract

Radiotherapy is a crucial cancer treatment that demands precise planning to balance tumor eradication and preservation of healthy tissue. Traditional treatment planning (TP) is iterative, time-consuming, and reliant on human expertise, which can potentially introduce variability and inefficiency. We propose a novel framework to transform a large multimodal foundation model (MLM) into an action model for TP using a few-shot reinforcement learning (RL) approach. Our method leverages the MLM's extensive pre-existing knowledge of physics, radiation, and anatomy, enhancing it through a few-shot learning process. This allows the model to iteratively improve treatment plans using a Monte Carlo simulator. Our results demonstrate that this method outperforms conventional RL-based approaches in both quality and efficiency, achieving higher reward scores and more optimal dose distributions in simulations on prostate cancer data. This proof-of-concept suggests a promising direction for integrating advanced AI models into clinical workflows, potentially enhancing the speed, quality, and standardization of radiotherapy treatment planning.

Paper Structure

This paper contains 10 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Workflow of an action model for treatment planning. The model processes a patient's CT scan to determine optimal gantry angles, generating a dose distribution evaluated to produce a reward score. An example shows an initial plan with a reward of -300, improved to -180 after refinement by the multimodal pretrained vision-language model, highlighting iterative enhancement with Monte Carlo simulation.
  • Figure 2: A) Dose Volume Histograms (DVHs) for baseline, RL, and Text-to-Plan models, showing dose distribution to the target (PTV) and organs at risk (OARs). B) Dose distribution maps for the baseline (left), RL (middle), and Text-to-Plan (right) models. C) Box plot of the reward values for Text-to-Plan, RL, and random methods, demonstrating the superior performance of the Text-to-Plan approach in optimizing treatment plans.