Table of Contents
Fetching ...

Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

Tiantian Zhang, Manxi Lin, Hongda Guo, Xiaofan Zhang, Ka Fung Peter Chiu, Aasa Feragen, Qi Dou

TL;DR

This work tackles automatic PI-RADS scoring for prostate cancer by embedding the PI-RADS Clinical Guideline (PICG) into an automated pipeline using a guideline network built on a Multi-modal Large Language Model (MLLM). It introduces a two-stage fine-tuning strategy with a domain adapter to adapt 3D MRI data and a PICG-to-instructions stage to produce PICG-guided image features, which are aligned with scoring-network representations via KL-divergence-based distillation. Experiments on a public MRI dataset and a private heterogeneous test set demonstrate consistent accuracy and error reductions across multiple state-of-the-art scoring networks, validating the approach’s effectiveness and generalizability. The method is model-agnostic and requires no additional annotations or network changes, offering a practical plug-in to enhance real-world PI-RADS scoring and potentially improve interpretability.

Abstract

The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of common PI-RADS clinical guideline~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi-modal large language model (MLLM) to incorporate PICG into PI-RADS scoring model without additional annotations and network parameters. We present a designed two-stage fine-tuning process aiming at adapting a MLLM originally trained on natural images to the MRI images while effectively integrating the PICG. Specifically, in the first stage, we develop a domain adapter layer tailored for processing 3D MRI inputs and instruct the MLLM to differentiate MRI sequences. In the second stage, we translate PICG for guiding instructions from the model to generate PICG-guided image features. Through such a feature distillation step, we align the scoring network's features with the PICG-guided image features, which enables the model to effectively incorporate the PICG information. We develop our model on a public dataset and evaluate it on an in-house dataset. Experimental results demonstrate that our approach effectively improves the performance of current scoring networks. Code is available at: https://github.com/med-air/PICG2scoring

Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

TL;DR

This work tackles automatic PI-RADS scoring for prostate cancer by embedding the PI-RADS Clinical Guideline (PICG) into an automated pipeline using a guideline network built on a Multi-modal Large Language Model (MLLM). It introduces a two-stage fine-tuning strategy with a domain adapter to adapt 3D MRI data and a PICG-to-instructions stage to produce PICG-guided image features, which are aligned with scoring-network representations via KL-divergence-based distillation. Experiments on a public MRI dataset and a private heterogeneous test set demonstrate consistent accuracy and error reductions across multiple state-of-the-art scoring networks, validating the approach’s effectiveness and generalizability. The method is model-agnostic and requires no additional annotations or network changes, offering a practical plug-in to enhance real-world PI-RADS scoring and potentially improve interpretability.

Abstract

The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of common PI-RADS clinical guideline~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi-modal large language model (MLLM) to incorporate PICG into PI-RADS scoring model without additional annotations and network parameters. We present a designed two-stage fine-tuning process aiming at adapting a MLLM originally trained on natural images to the MRI images while effectively integrating the PICG. Specifically, in the first stage, we develop a domain adapter layer tailored for processing 3D MRI inputs and instruct the MLLM to differentiate MRI sequences. In the second stage, we translate PICG for guiding instructions from the model to generate PICG-guided image features. Through such a feature distillation step, we align the scoring network's features with the PICG-guided image features, which enables the model to effectively incorporate the PICG information. We develop our model on a public dataset and evaluate it on an in-house dataset. Experimental results demonstrate that our approach effectively improves the performance of current scoring networks. Code is available at: https://github.com/med-air/PICG2scoring
Paper Structure (17 sections, 2 equations, 3 figures, 3 tables)

This paper contains 17 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The overview of our proposed method which consists of two-stage instruction tuning and feature distillation. In stage one, we design and train the domain adapter layer and use instruction tuning to distinguish "T2W" and "ADC&DWI" sequences. In stage two, we freeze the domain adapter layer and design another instruction to learn PICG. Domain adapter layer, image encoder, projection and LLaMA constitute the guideline network, while image classification encoder and classifier make up the scoring network.
  • Figure 2: Example of instructions for adapting guideline network to Prostate MRI.
  • Figure 3: Examples of instructions used for generating PICG-guided image features.