Table of Contents
Fetching ...

Towards Holistic Disease Risk Prediction using Small Language Models

Liv Björkdahl, Oskar Pauli, Johan Östman, Chiara Ceccobello, Sara Lundell, Magnus Kjellberg

TL;DR

This work tackles holistic disease risk prediction by leveraging small language models (SLMs) to jointly reason over multimodal healthcare data. It introduces a model-agnostic framework that projects modality-specific embeddings into a frozen LLM's token space via regularized autoencoder projectors, enabling single-model multi-task disease risk prediction. Evaluated on the MIMIC-IV ICU dataset across 12 tasks, using procedure/lab/chart time-series, chest X-ray images, and radiology text, the approach achieves competitive performance and demonstrates the potential of SLMs for multimodal reasoning in healthcare, while highlighting trade-offs relative to single-task baselines like XGBoost. The results suggest that joint training across modalities improves recall and that such multimodal fusion can generalize across data sources, modalities, and tasks, offering a scalable path toward holistic disease risk prediction in clinical practice.

Abstract

Data in the healthcare domain arise from a variety of sources and modalities, such as x-ray images, continuous measurements, and clinical notes. Medical practitioners integrate these diverse data types daily to make informed and accurate decisions. With recent advancements in language models capable of handling multimodal data, it is a logical progression to apply these models to the healthcare sector. In this work, we introduce a framework that connects small language models to multiple data sources, aiming to predict the risk of various diseases simultaneously. Our experiments encompass 12 different tasks within a multitask learning setup. Although our approach does not surpass state-of-the-art methods specialized for single tasks, it demonstrates competitive performance and underscores the potential of small language models for multimodal reasoning in healthcare.

Towards Holistic Disease Risk Prediction using Small Language Models

TL;DR

This work tackles holistic disease risk prediction by leveraging small language models (SLMs) to jointly reason over multimodal healthcare data. It introduces a model-agnostic framework that projects modality-specific embeddings into a frozen LLM's token space via regularized autoencoder projectors, enabling single-model multi-task disease risk prediction. Evaluated on the MIMIC-IV ICU dataset across 12 tasks, using procedure/lab/chart time-series, chest X-ray images, and radiology text, the approach achieves competitive performance and demonstrates the potential of SLMs for multimodal reasoning in healthcare, while highlighting trade-offs relative to single-task baselines like XGBoost. The results suggest that joint training across modalities improves recall and that such multimodal fusion can generalize across data sources, modalities, and tasks, offering a scalable path toward holistic disease risk prediction in clinical practice.

Abstract

Data in the healthcare domain arise from a variety of sources and modalities, such as x-ray images, continuous measurements, and clinical notes. Medical practitioners integrate these diverse data types daily to make informed and accurate decisions. With recent advancements in language models capable of handling multimodal data, it is a logical progression to apply these models to the healthcare sector. In this work, we introduce a framework that connects small language models to multiple data sources, aiming to predict the risk of various diseases simultaneously. Our experiments encompass 12 different tasks within a multitask learning setup. Although our approach does not surpass state-of-the-art methods specialized for single tasks, it demonstrates competitive performance and underscores the potential of small language models for multimodal reasoning in healthcare.
Paper Structure (14 sections, 5 equations, 2 figures, 2 tables)

This paper contains 14 sections, 5 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Joint training of projectors for $x$ containing $S$ sources based on the asymmetric loss \ref{['eq:asl']}. The pipeline shown for $x^1$ is replicated for all $x^s$, $s\in [S]$. The snowflake symbolizes that model weights are frozen.
  • Figure 2: Data extraction process for a given patient $p$.