What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection

Lei Jiang; Yue Zhou; Natalie Parde

What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection

Lei Jiang, Yue Zhou, Natalie Parde

TL;DR

The paper tackles the challenge of early Alzheimer's disease detection with limited labeled data by fine-tuning LLMs on AD transcripts and using linear probes to identify how AD signals are encoded within representations. It finds that AD-related information concentrates in specific markers and words after fine-tuning, enabling a data-synthesis approach that uses T5 to generate diagnostically informative synthetic samples guided by an AD linguistic marker set. Four main contributions are demonstrated: effective SFT for AD detection, a linear-probe framework to quantify AD-related information, token-level probing to pinpoint informative linguistic elements, and a marker-driven data-synthesis method to augment training data. The findings advance practical AD detectors and offer a framework for probing and augmenting low-resource clinical NLP tasks using LLMs, with implications for safer and more scalable deployment in healthcare contexts.

Abstract

Reliable early detection of Alzheimer's disease (AD) is challenging, particularly due to limited availability of labeled data. While large language models (LLMs) have shown strong transfer capabilities across domains, adapting them to the AD domain through supervised fine-tuning remains largely unexplored. In this work, we fine-tune an LLM for AD detection and investigate how task-relevant information is encoded within its internal representations. We employ probing techniques to analyze intermediate activations across transformer layers, and we observe that, after fine-tuning, the probing values of specific words and special markers change substantially, indicating that these elements assume a crucial role in the model's improved detection performance. Guided by this insight, we design a curated set of task-aware special markers and train a sequence-to-sequence model as a data-synthesis tool that leverages these markers to generate structurally consistent and diagnostically informative synthetic samples. We evaluate the synthesized data both intrinsically and by incorporating it into downstream training pipelines.

What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection

TL;DR

Abstract

Paper Structure (28 sections, 10 equations, 7 figures, 3 tables)

This paper contains 28 sections, 10 equations, 7 figures, 3 tables.

Introduction
Related Work
Alzheimer's Disease Detection
Supervised Fine-tuning
Linear probes
Methodology
Problem Description
Supervised Finetuning Loss
Standard SFT (Cross-Entropy Loss).
Standard SFT with Contrastive Loss.
Focal Loss.
Label Smoothing.
Probing
Experiments
Experimental Settings
...and 13 more sections

Figures (7)

Figure 1: We acquire the representation of each token and project it by the linear probe. Tokens marked with blue are special markers that capture critical aspects of speech.
Figure 2: Progression of probe performance metrics across successive model layers.
Figure 3: Token probing value distribution.
Figure 4: This figure shows the token probing value difference before and after fine-tuning
Figure 5: T5 marker pipeline
...and 2 more figures

What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection

TL;DR

Abstract

What Do LLMs Know About Alzheimer's Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)