What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection
Lei Jiang, Yue Zhou, Natalie Parde
TL;DR
The paper tackles the challenge of early Alzheimer's disease detection with limited labeled data by fine-tuning LLMs on AD transcripts and using linear probes to identify how AD signals are encoded within representations. It finds that AD-related information concentrates in specific markers and words after fine-tuning, enabling a data-synthesis approach that uses T5 to generate diagnostically informative synthetic samples guided by an AD linguistic marker set. Four main contributions are demonstrated: effective SFT for AD detection, a linear-probe framework to quantify AD-related information, token-level probing to pinpoint informative linguistic elements, and a marker-driven data-synthesis method to augment training data. The findings advance practical AD detectors and offer a framework for probing and augmenting low-resource clinical NLP tasks using LLMs, with implications for safer and more scalable deployment in healthcare contexts.
Abstract
Reliable early detection of Alzheimer's disease (AD) is challenging, particularly due to limited availability of labeled data. While large language models (LLMs) have shown strong transfer capabilities across domains, adapting them to the AD domain through supervised fine-tuning remains largely unexplored. In this work, we fine-tune an LLM for AD detection and investigate how task-relevant information is encoded within its internal representations. We employ probing techniques to analyze intermediate activations across transformer layers, and we observe that, after fine-tuning, the probing values of specific words and special markers change substantially, indicating that these elements assume a crucial role in the model's improved detection performance. Guided by this insight, we design a curated set of task-aware special markers and train a sequence-to-sequence model as a data-synthesis tool that leverages these markers to generate structurally consistent and diagnostically informative synthetic samples. We evaluate the synthesized data both intrinsically and by incorporating it into downstream training pipelines.
