Table of Contents
Fetching ...

Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

Hisashi Johno, Yuki Johno, Akitomo Amakawa, Junichi Sato, Ryota Tozuka, Atsushi Komaba, Hiroaki Watanabe, Hiroki Watanabe, Chihiro Goto, Hiroyuki Morisaka, Hiroshi Onishi, Kazunori Nakamoto

TL;DR

This study tackles whether retrieval-augmented generation (RAG) can meaningfully improve LLM-based cancer staging by isolating the effect of RAG from model differences. It compares NotebookLM (RAG-enabled) with its internal LLM Gemini 2.0 Flash across three REK conditions using 100 fictional pancreatic cancer cases and $TNM$ classification, local invasion factors, and resectability. The results show that RAG-enabled NotebookLM achieves higher accuracy (overall 70%), $TNM$ accuracy of 80%, and provides explicit REK excerpts with 92% retrieval success, outperforming non-RAG baselines. The work highlights RAG's potential to support clinical diagnosis with transparent evidence while noting limitations such as misinterpretation and the need for offline, secure deployments for real-world use.

Abstract

Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.

Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

TL;DR

This study tackles whether retrieval-augmented generation (RAG) can meaningfully improve LLM-based cancer staging by isolating the effect of RAG from model differences. It compares NotebookLM (RAG-enabled) with its internal LLM Gemini 2.0 Flash across three REK conditions using 100 fictional pancreatic cancer cases and classification, local invasion factors, and resectability. The results show that RAG-enabled NotebookLM achieves higher accuracy (overall 70%), accuracy of 80%, and provides explicit REK excerpts with 92% retrieval success, outperforming non-RAG baselines. The work highlights RAG's potential to support clinical diagnosis with transparent evidence while noting limitations such as misinterpretation and the need for offline, secure deployments for real-world use.

Abstract

Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.

Paper Structure

This paper contains 6 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An overview of the experimental process. Radiologists from our team generated CT findings for 100 fictional pancreatic cancer patients. NotebookLM with REK (REK+/RAG+), Gemini 2.0 Flash with REK (REK+/RAG-), and Gemini 2.0 Flash without REK (REK-/RAG-) conducted cancer staging based on the CT findings in response to Tasks 1--5 (see Table 2). In the REK+/RAG+ group, retrieved excerpts from REK were available alongside the classifications. The REK was uploaded to the NotebookLM web system for RAG processing. In the REK+/RAG- group, the REK was manually entered into the prompt field before providing Tasks 1--5. REK=reliable external knowledge, RAG=retrieval-augmented generation
  • Figure 2: Staging performance of NotebookLM with REK, Gemini 2.0 Flash with REK, and Gemini 2.0 Flash without REK in the experiment using 100 fictional pancreatic cancer cases. Staging was considered accurate if all the staging components---TNM classification, local invasion factors, and resectability classification---were correctly determined. For NotebookLM, retrieval accuracy was also evaluated. Retrieval was considered accurate if the retrieved excerpts from REK contained sufficient information to enable accurate cancer staging. REK=reliable external knowledge
  • Figure 3: TNM classification performance of NotebookLM with REK, Gemini 2.0 Flash with REK, and Gemini 2.0 Flash without REK in the experiment using 100 fictional pancreatic cancer cases. The TNM classification was deemed correct only if all T, N, and M factors were accurately identified. Additionally, the classification accuracy for each T, N, and M factor was compared across the three groups. REK=reliable external knowledge
  • Figure 4: The performance of NotebookLM with REK, Gemini 2.0 Flash with REK, and Gemini 2.0 Flash without REK in determining local invasion factors and resectability classification in the experiment using 100 fictional pancreatic cancer cases. REK=reliable external knowledge
  • Figure 5: A representative result from the pancreatic cancer staging experiment (Case 98). In this case, both staging and retrieval by NotebookLM were correct, whereas staging by Gemini 2.0 Flash with REK and without REK was incorrect. A subset of the REK excerpts retrieved by NotebookLM is available in Supplementary file 5, while the full set (from 1 to 9) can be found in Supplementary file 2. REK=reliable external knowledge
  • ...and 1 more figures