Table of Contents
Fetching ...

Multimodal Survival Analysis with Locally Deployable Large Language Models

Moritz Gögl, Christopher Yau

Abstract

We study multimodal survival analysis integrating clinical text, tabular covariates, and genomic profiles using locally deployable large language models (LLMs). As many institutions face tight computational and privacy constraints, this setting motivates the use of lightweight, on-premises models. Our approach jointly estimates calibrated survival probabilities and generates concise, evidence-grounded prognosis text via teacher-student distillation and principled multimodal fusion. On a TCGA cohort, it outperforms standard baselines, avoids reliance on cloud services and associated privacy concerns, and reduces the risk of hallucinated or miscalibrated estimates that can be observed in base LLMs.

Multimodal Survival Analysis with Locally Deployable Large Language Models

Abstract

We study multimodal survival analysis integrating clinical text, tabular covariates, and genomic profiles using locally deployable large language models (LLMs). As many institutions face tight computational and privacy constraints, this setting motivates the use of lightweight, on-premises models. Our approach jointly estimates calibrated survival probabilities and generates concise, evidence-grounded prognosis text via teacher-student distillation and principled multimodal fusion. On a TCGA cohort, it outperforms standard baselines, avoids reliance on cloud services and associated privacy concerns, and reduces the risk of hallucinated or miscalibrated estimates that can be observed in base LLMs.
Paper Structure (26 sections, 6 equations, 5 figures, 2 tables)

This paper contains 26 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the proposed framework. A compact causal LLM encodes the input text into hidden embeddings used by a survival model (here, CoxPH) and produces verbalized survival estimates with explanatory text. Covariates and gene expression are fused either early (A) or late (B).
  • Figure 2: Teacher pipeline for constructing the student target. (1) Numeric prompting at 1/3/5 years from the input text; extract survival probabilities, and fit a parametric curve (exponential by default) to obtain the 3‑year value. (2) Explanation prompting conditioned on the text and the rounded 3‑year percentage; the explanation plus a marked probability sentence form the student’s training target.
  • Figure 3: Qualitative example: original report (left) and generated assessment (right). Consistent evidence spans are highlighted in the same color.
  • Figure A.1: Additional qualitative example: original report (left) and student-generated assessment (right). Consistent evidence spans are highlighted with the same color; missing but clinically relevant information is framed in orange. The explanation is reasonable and faithful to the report, but the 3-year survival probability (65%) is clearly over-optimistic.
  • Figure A.2: Two negative examples of student-generated assessments. Both outputs exhibit degraded English fluency; the right example further fails to provide an explicit verbalized probability.