Table of Contents
Fetching ...

LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Yeona Cho, Ik Jae Lee, Jin Sung Kim, Jong Chul Ye

TL;DR

An LLM-driven multimodal artificial intelligence (AI) that utilizes the clinical information and is applicable to the challenging task of 3-dimensional context-aware target volume delineation for radiation oncology is presented.

Abstract

Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven multimodal AI, namely LLMSeg, that utilizes the clinical text information and is applicable to the challenging task of target volume contouring for radiation therapy, and validate it within the context of breast cancer radiation therapy target volume contouring. Using external validation and data-insufficient environments, which attributes highly conducive to real-world applications, we demonstrate that the proposed model exhibits markedly improved performance compared to conventional unimodal AI models, particularly exhibiting robust generalization performance and data efficiency. To our best knowledge, this is the first LLM-driven multimodal AI model that integrates the clinical text information into target volume delineation for radiation oncology.

LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

TL;DR

An LLM-driven multimodal artificial intelligence (AI) that utilizes the clinical information and is applicable to the challenging task of 3-dimensional context-aware target volume delineation for radiation oncology is presented.

Abstract

Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven multimodal AI, namely LLMSeg, that utilizes the clinical text information and is applicable to the challenging task of target volume contouring for radiation therapy, and validate it within the context of breast cancer radiation therapy target volume contouring. Using external validation and data-insufficient environments, which attributes highly conducive to real-world applications, we demonstrate that the proposed model exhibits markedly improved performance compared to conventional unimodal AI models, particularly exhibiting robust generalization performance and data efficiency. To our best knowledge, this is the first LLM-driven multimodal AI model that integrates the clinical text information into target volume delineation for radiation oncology.
Paper Structure (21 sections, 3 equations, 4 figures, 6 tables)

This paper contains 21 sections, 3 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of our proposed LLMSeg. (a) Illustration comparing the concept between the traditional vision-only AI and the multimodal AI in the context of radiotherapy target volume delineation. (b) Quantitative comparison of CTV contouring performance in the Dice metric. The Dice metric for each trial is presented with whiskers representing the range from minimum to maximum values. The center line indicates the median, the bounds of the box represent the interquartile range (from the lower quartile to the upper quartile), and the x mark indicates the mean. n denotes number of patients. The p-values indicate the statistically significant superiority of the proposed multimodal LLMSeg. All statistical tests were two-sided. (c) Visual assessment of each concept. Source data are provided as a Source Data file.
  • Figure 2: Comparison of target contouring performance based on varying training dataset sizes. (a) Quantitative comparison for all the validation sets. The Dice metric for each trial is presented as mean values (center lines) with 95th percentile of confidence intervals calculated with the non-parametric bootstrap method (shaded areas). n denotes number of patients. (b) Visual comparison for external validation #1. Source data are provided as a Source Data file.
  • Figure 3: Analysis of clinical data alignment for target contouring. (a) Illustration of modification of the input clinical data, given the same CT scan. Red font indicates modified input text. (b-c) Visual assessment of radiotherapy target contouring with modified input clinical data.
  • Figure 4: Qualitative comparison of different multimodal methods with omitted clinical data components. (a) Comparison with numeric category method: Case 1 (left breast, T2N1M0, post-mastectomy) and Case 2 (left breast, T2N1M0, post-breast conservation surgery) show our method (LLMSeg) accurately includes surgically treated areas and regional nodes, while the numeric category method inaccurately segments both breasts, missing clinical context. (b) Omission experiment for tumor information: For right breast T1aN0M0 cancer, our method segments accurately without omission. Omitting T stage, N stage, or laterality causes incorrect regional node inclusion or opposite breast contours. The competing method is inaccurate regardless of omission. (c) Omission experiment for surgery information: In left breast T1cN1M0 cancer post-mastectomy, our method without surgery information mimics breast-conserving surgery. The competing method inaccurately contours the opposite breast irrespective of surgery information.