Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

Sophie Kearney; Shu Yang; Zixuan Wen; Weimin Lyu; Bojian Hou; Duy Duong-Tran; Tianlong Chen; Jason H. Moore; Marylyn D. Ritchie; Chao Chen; Li Shen

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

Sophie Kearney, Shu Yang, Zixuan Wen, Weimin Lyu, Bojian Hou, Duy Duong-Tran, Tianlong Chen, Jason H. Moore, Marylyn D. Ritchie, Chao Chen, Li Shen

Abstract

Accurate diagnosis of Alzheimer's disease (AD) requires handling tabular biomarker data, yet such data are often small and incomplete, where deep learning models frequently fail to outperform classical methods. Pretrained large language models (LLMs) offer few-shot generalization, structured reasoning, and interpretable outputs, providing a powerful paradigm shift for clinical prediction. We propose TAP-GPT Tabular Alzheimer's Prediction GPT, a domain-adapted tabular LLM framework built on TableGPT2 and fine-tuned for few-shot AD classification using tabular prompts rather than plain texts. We evaluate TAP-GPT across four ADNI-derived datasets, including QT-PAD biomarkers and region-level structural MRI, amyloid PET, and tau PET for binary AD classification. Across multimodal and unimodal settings, TAP-GPT improves upon its backbone models and outperforms traditional machine learning baselines in the few-shot setting while remaining competitive with state-of-the-art general-purpose LLMs. We show that feature selection mitigates degradation in high-dimensional inputs and that TAP-GPT maintains stable performance under simulated and real-world missingness without imputation. Additionally, TAP-GPT produces structured, modality-aware reasoning aligned with established AD biology and shows greater stability under self-reflection, supporting its use in iterative multi-agent systems. To our knowledge, this is the first systematic application of a tabular-specialized LLM to multimodal biomarker-based AD prediction, demonstrating that such pretrained models can effectively address structured clinical prediction tasks and laying the foundation for tabular LLM-driven multi-agent clinical decision-support systems. The source code is publicly available on GitHub: https://github.com/sophie-kearney/TAP-GPT.

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

Abstract

Paper Structure (25 sections, 12 figures)

This paper contains 25 sections, 12 figures.

Introduction
Methods
Datasets
Quantitative templates for the progression of AD Biomarker Data
Regional Summary Imaging Data
Creating Tabular Prompts
TAP-GPT Framework
Experimental Setup
Results
Overall Model Performance
Model Performance on Biomarker Data
Model Performance on Imaging Data
Ablations
k-Ablation Analysis for Biomarker Data
k- and p- Ablation Analysis for Imaging Data
...and 10 more sections

Figures (12)

Figure 1: Overview of the TAP-GPT framework. For each dataset, we split the subjects into training, testing, and pools for in-context examples, with an optional feature selection step. We construct tables for finetuning and evaluating TAP-GPT for the task of Alzheimer's disease prediction.
Figure 2: The four prompt formats used in our experiments, with tabular prompts shown in green and serialized prompts in blue. In each case, the model is asked to predict Alzheimer’s disease status for the same held-out patient. All values shown are synthetic with abbreviated prompts.
Figure 3: QT-PAD mean F1 across models in zero-shot and few-shot ($k=8$) contexts. LLMs use tabular (green) and serialized (blue) prompts with error bars for standard deviation; TabPFN and traditional ML (yellow) operate directly on structured data.
Figure 4: Imaging ROI mean F1 across models in the zero- and few-shot ($k=4$) contexts. Tabular and Serialized prompts are given to LLMs, with traditional ML and TabPFN models directly using the data. Model performance is evaluated across three imaging modalities, Amyloid PET, Tau PET, and Structural MRI.
Figure 5: Number of ICL examples (k) ablation analysis on QT-PAD data across TableGPT2, TabPFN, and TAP-GPT. TabPFN performance steadily improved with larger k, TableGPT2 improved up to $k=6$ and declined thereafter, and TAP-GPT peaked at $k=8$.
...and 7 more figures

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

Abstract

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

Authors

Abstract

Table of Contents

Figures (12)