Table of Contents
Fetching ...

SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Shuhao Mei, Yongchao Long, Shan Cao, Xiaobo Han, Shijia Geng, Jinbo Sun, Yuxi Zhou, Shenda Hong

TL;DR

This work proposes SpiroLLM, the first multimodal large language model that can understand spirogram, and demonstrates the substantial potential of deeply fusing physiological signals with large language models, establishing a new paradigm for the next generation of interpretable and reliable clinical decision support tools.

Abstract

Chronic Obstructive Pulmonary Disease (COPD), a major chronic respiratory disease with persistent airflow limitation, is a leading global cause of disability and mortality. Respiratory spirogram time series, routinely collected during pulmonary function tests (PFTs), play a critical role in the early detection of repsiratory diseases and in monitoring lung function over time. However, most current AI models for COPD diagnosis are limited to outputting classification results without providing a rationale for their diagnostic process, while current Large Language Models (LLMs) cannot understand spirograms yet, which severely limits their clinical trust and adoption. To tackle this challenge, we leverage a cohort of 234,028 individuals from the UK Biobank (UKB) to propose SpiroLLM, the first multimodal large language model that can understand spirogram. The model extracts morphological features from respiratory curves via a SpiroEncoder and aligns them with PFT numerical values in a unified latent space using a SpiroProjector, ultimately empowering a large language model to generate a comprehensive diagnostic report. Experimental results confirm that SpiroLLM achieved a diagnostic AUROC of 0.8977 (95% CI: 0.88-0.91). In a robustness test with missing core data, it maintained a 100% valid response rate, far surpassing the 13.4% of a text-only model and showcasing the superiority of its multimodal design. This work demonstrates the substantial potential of deeply fusing physiological signals with large language models, establishing a new paradigm for the next generation of interpretable and reliable clinical decision support tools.

SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

TL;DR

This work proposes SpiroLLM, the first multimodal large language model that can understand spirogram, and demonstrates the substantial potential of deeply fusing physiological signals with large language models, establishing a new paradigm for the next generation of interpretable and reliable clinical decision support tools.

Abstract

Chronic Obstructive Pulmonary Disease (COPD), a major chronic respiratory disease with persistent airflow limitation, is a leading global cause of disability and mortality. Respiratory spirogram time series, routinely collected during pulmonary function tests (PFTs), play a critical role in the early detection of repsiratory diseases and in monitoring lung function over time. However, most current AI models for COPD diagnosis are limited to outputting classification results without providing a rationale for their diagnostic process, while current Large Language Models (LLMs) cannot understand spirograms yet, which severely limits their clinical trust and adoption. To tackle this challenge, we leverage a cohort of 234,028 individuals from the UK Biobank (UKB) to propose SpiroLLM, the first multimodal large language model that can understand spirogram. The model extracts morphological features from respiratory curves via a SpiroEncoder and aligns them with PFT numerical values in a unified latent space using a SpiroProjector, ultimately empowering a large language model to generate a comprehensive diagnostic report. Experimental results confirm that SpiroLLM achieved a diagnostic AUROC of 0.8977 (95% CI: 0.88-0.91). In a robustness test with missing core data, it maintained a 100% valid response rate, far surpassing the 13.4% of a text-only model and showcasing the superiority of its multimodal design. This work demonstrates the substantial potential of deeply fusing physiological signals with large language models, establishing a new paradigm for the next generation of interpretable and reliable clinical decision support tools.

Paper Structure

This paper contains 6 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: This figure compares three workflows for pulmonary function assessment: the traditional clinical model (A), which relies on cumbersome in-clinic testing; the traditional large language model (B), which cannot understand raw physiological signals; and our proposed SpiroLLM framework (C), which supports at-home self-testing and instant generation of professional reports, significantly improving efficiency.
  • Figure 2: A schematic diagram of the overall architecture of the SpiroLLM framework. The figure illustrates the complete end-to-end process, from raw pulmonary function test time-series data to the generation of a professional diagnostic report. The blue section represents the SpiroEncoder module, which extracts high-level features from the spirometry curves and performs cross-modal alignment with the large language model via the SpiroProjector. The yellow section is the Query Prompt construction module, which integrates the COPD probability output by the SpiroEncoder, key PFT parameters extracted by SpiroUtils, and the patient's demographic information to form the model's input prompt. The green section represents the gold-standard report generation process. This process begins by using the Qwen-VL model to generate morphological descriptions from the pulmonary function curve images, it then incorporates the PFT values extracted by SpiroUtils and introduces relevant domain knowledge through a RAG-based knowledge base system. Finally, all this information is integrated by the DeepSeek V3 model to generate a high-quality, standardized diagnostic report, which serves as the training target output for SpiroLLM.
  • Figure 3: Comparison of performance among different methods. The table reports the Mean (95% Confidence Interval). The proposed SpiroLLM demonstrates superior performance, particularly in Sensitivity, compared to all baselines.
  • Figure 4: Interpretable visualization of the SpiroEncoder. Flow-volume curves are color-coded based on attention weights extracted from the final layer of the encoder (red indicates high attention; gray indicates low attention). (a-b) COPD cases: The model exhibits high attention toward the concave descending limb, successfully identifying the characteristic "scooped" pattern associated with airway obstruction. (c-d) Normal cases: Attention remains focused on the descending limb, validating the linear or convex profile associated with normal lung function. This confirms that the model explicitly learns to analyze morphological features indicative of small airway obstruction.
  • Figure 5: Comparative analysis of SpiroLLM and a baseline model. The figure demonstrates SpiroLLM's ability to correctly interpret primary diagnostic criteria, while the baseline model is misled by secondary indicators, resulting in an incorrect diagnosis.
  • ...and 5 more figures