Predicting Lung Cancer Patient Prognosis with Large Language Models

Danqing Hu; Bing Liu; Xiang Li; Xiaofeng Zhu; Nan Wu

Predicting Lung Cancer Patient Prognosis with Large Language Models

Danqing Hu, Bing Liu, Xiang Li, Xiaofeng Zhu, Nan Wu

TL;DR

Prognosis prediction for lung cancer is improved by evaluating zero-shot large language models (GPT-4o mini and GPT-3.5) on two datasets: survival across horizons $N \in \{1,2,3,4,5\}$ years and post-operative complications, against logistic regression baselines. The study designs prompts with Role/Task/Patient data/Instructions and enforces Chain-of-Thought reasoning with JSON outputs, using 10-fold cross-validation for survival and 5-fold (or 3-fold for rare outcomes) for complications. Results show GPT-4o mini often achieves higher AUROC and AUPRC than GPT-3.5 and LR, with some exceptions (e.g., 3-year survival), and demonstrates robust performance across multiple tasks. This work suggests LLMs can provide prognostic utility without relying on retrospective patient data and highlights the potential for future multimodal integration, such as combining imaging with clinical data, to further boost clinical decision support.

Abstract

Prognosis prediction is crucial for determining optimal treatment plans for lung cancer patients. Traditionally, such predictions relied on models developed from retrospective patient data. Recently, large language models (LLMs) have gained attention for their ability to process and generate text based on extensive learned knowledge. In this study, we evaluate the potential of GPT-4o mini and GPT-3.5 in predicting the prognosis of lung cancer patients. We collected two prognosis datasets, i.e., survival and post-operative complication datasets, and designed multiple tasks to assess the models' performance comprehensively. Logistic regression models were also developed as baselines for comparison. The experimental results demonstrate that LLMs can achieve competitive, and in some tasks superior, performance in lung cancer prognosis prediction compared to data-driven logistic regression models despite not using additional patient data. These findings suggest that LLMs can be effective tools for prognosis prediction in lung cancer, particularly when patient data is limited or unavailable.

Predicting Lung Cancer Patient Prognosis with Large Language Models

TL;DR

Prognosis prediction for lung cancer is improved by evaluating zero-shot large language models (GPT-4o mini and GPT-3.5) on two datasets: survival across horizons

years and post-operative complications, against logistic regression baselines. The study designs prompts with Role/Task/Patient data/Instructions and enforces Chain-of-Thought reasoning with JSON outputs, using 10-fold cross-validation for survival and 5-fold (or 3-fold for rare outcomes) for complications. Results show GPT-4o mini often achieves higher AUROC and AUPRC than GPT-3.5 and LR, with some exceptions (e.g., 3-year survival), and demonstrates robust performance across multiple tasks. This work suggests LLMs can provide prognostic utility without relying on retrospective patient data and highlights the potential for future multimodal integration, such as combining imaging with clinical data, to further boost clinical decision support.

Abstract

Paper Structure (11 sections, 4 figures, 4 tables)

This paper contains 11 sections, 4 figures, 4 tables.

Introduction
Materials and methods
Prognosis datasets
Study design
Prompt design
Experimental setup
Results
Prognosis data
Predictive performance
Discussion
Conclusion

Figures (4)

Figure 1: Overall study design.
Figure 2: Prompt templates. (a) 1-year survival prediction prompt template, (b) Combined complication prediction prompt template
Figure 3: The AUROC (a) and AUPRC (b) values of the survival prediction models.
Figure 4: The AUROC (a) and AUPRC (b) values of the post-operative complication prediction models.

Predicting Lung Cancer Patient Prognosis with Large Language Models

TL;DR

Abstract

Predicting Lung Cancer Patient Prognosis with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)