Table of Contents
Fetching ...

LLM enhanced graph inference for long-term disease progression modelling

Tiantian He, An Zhao, Elinor Thompson, Anna Schroder, Ahmed Abdulaal, Frederik Barkhof, Daniel C. Alexander

TL;DR

This work tackles long-term neurodegenerative disease progression modeling by jointly learning continuous biomarker trajectories and a biologically constrained interaction graph, from irregular longitudinal data. It introduces an LLM-guided framework that generates probabilistic, multi-modal brain graphs, filters them to sparsity, and embeds them into a diffusion-like dynamical system to fit tau-pathology trajectories. A dual optimization scheme combines LLM-derived priors with data-driven weight refinement, yielding improved predictive accuracy and interpretability over conventional connectome-based or purely data-driven approaches, demonstrated on tau-PET data from ADNI. The framework also provides mechanistic insights beyond standard connectivity maps by surfacing novel links and explicable reasoning from LLMs, enabling more robust disease staging and potential generalization to other domains.

Abstract

Understanding the interactions between biomarkers among brain regions during neurodegenerative disease is essential for unravelling the mechanisms underlying disease progression. For example, pathophysiological models of Alzheimer's Disease (AD) typically describe how variables, such as regional levels of toxic proteins, interact spatiotemporally within a dynamical system driven by an underlying biological substrate, often based on brain connectivity. However, current methods grossly oversimplify the complex relationship between brain connectivity by assuming a single-modality brain connectome as the disease-spreading substrate. This leads to inaccurate predictions of pathology spread, especially during the long-term progression period. Meanhwile, other methods of learning such a graph in a purely data-driven way face the identifiability issue due to lack of proper constraint. We thus present a novel framework that uses Large Language Models (LLMs) as expert guides on the interaction of regional variables to enhance learning of disease progression from irregularly sampled longitudinal patient data. By leveraging LLMs' ability to synthesize multi-modal relationships and incorporate diverse disease-driving mechanisms, our method simultaneously optimizes 1) the construction of long-term disease trajectories from individual-level observations and 2) the biologically-constrained graph structure that captures interactions among brain regions with better identifiability. We demonstrate the new approach by estimating the pathology propagation using tau-PET imaging data from an Alzheimer's disease cohort. The new framework demonstrates superior prediction accuracy and interpretability compared to traditional approaches while revealing additional disease-driving factors beyond conventional connectivity measures.

LLM enhanced graph inference for long-term disease progression modelling

TL;DR

This work tackles long-term neurodegenerative disease progression modeling by jointly learning continuous biomarker trajectories and a biologically constrained interaction graph, from irregular longitudinal data. It introduces an LLM-guided framework that generates probabilistic, multi-modal brain graphs, filters them to sparsity, and embeds them into a diffusion-like dynamical system to fit tau-pathology trajectories. A dual optimization scheme combines LLM-derived priors with data-driven weight refinement, yielding improved predictive accuracy and interpretability over conventional connectome-based or purely data-driven approaches, demonstrated on tau-PET data from ADNI. The framework also provides mechanistic insights beyond standard connectivity maps by surfacing novel links and explicable reasoning from LLMs, enabling more robust disease staging and potential generalization to other domains.

Abstract

Understanding the interactions between biomarkers among brain regions during neurodegenerative disease is essential for unravelling the mechanisms underlying disease progression. For example, pathophysiological models of Alzheimer's Disease (AD) typically describe how variables, such as regional levels of toxic proteins, interact spatiotemporally within a dynamical system driven by an underlying biological substrate, often based on brain connectivity. However, current methods grossly oversimplify the complex relationship between brain connectivity by assuming a single-modality brain connectome as the disease-spreading substrate. This leads to inaccurate predictions of pathology spread, especially during the long-term progression period. Meanhwile, other methods of learning such a graph in a purely data-driven way face the identifiability issue due to lack of proper constraint. We thus present a novel framework that uses Large Language Models (LLMs) as expert guides on the interaction of regional variables to enhance learning of disease progression from irregularly sampled longitudinal patient data. By leveraging LLMs' ability to synthesize multi-modal relationships and incorporate diverse disease-driving mechanisms, our method simultaneously optimizes 1) the construction of long-term disease trajectories from individual-level observations and 2) the biologically-constrained graph structure that captures interactions among brain regions with better identifiability. We demonstrate the new approach by estimating the pathology propagation using tau-PET imaging data from an Alzheimer's disease cohort. The new framework demonstrates superior prediction accuracy and interpretability compared to traditional approaches while revealing additional disease-driving factors beyond conventional connectivity measures.

Paper Structure

This paper contains 28 sections, 4 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Model Overview The proposed framework for constructing a full disease progression process from snapshots, by iteratively estimating subject locations and the embedded graph. The graph plays a dominant role in shaping the disease trajectory. Graph inference includes LLM query, graph filtering and data-driven graph weights learning.
  • Figure 2: Different graph learning outputs from data-driven model NGM
  • Figure 3: Model Performance: R correlation on test set vs parameter number (Left); AIC on training set vs parameter number (Right). The dashed vertical lines represent the critical edge numbers of LLMs. The graph obtained from the mixture of LLMs provides the lowest AIC at the smallest parameter number, followed by Claude 3.5. As the number of learnable parameters increases, all models tend to have the same performance level. The LLM-based graphs allow the model to retain high performance to much greater sparsity levels than the connectivity-based graphs.
  • Figure 4: Verification of the LLM graph
  • Figure 5: This figure displays one representative example of an output from Claude 3.5. Factors in red (6 - 10) are those which weren't mentioned in the prompt.
  • ...and 6 more figures