Table of Contents
Fetching ...

Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation

Changqing Gong, Huafeng Qin, Mounim A. El-Yacoubi

TL;DR

This work addresses cross-task generalization in handwriting-based Alzheimer's screening by repurposing a CLIP-based vision-language model through Cross-Layer Fusion Adapters (CLFA). CLFA progressively aligns visual features to handwriting-specific cues via multi-level, tokenwise depthwise fusion and residual retargeting, enabling prompt-free zero-shot inference across unseen handwriting tasks. In experiments on the DARWIN-RAW dataset, CLFA outperforms a MVFA baseline in cross-task AUC and uncovers structured task relationships that map visuospatial, motor, and language processes to AD signatures. These results offer a scalable framework and practical guidance for designing robust handwriting-based digital biomarkers with cross-task generalization across diverse handwriting tasks.

Abstract

Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.

Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation

TL;DR

This work addresses cross-task generalization in handwriting-based Alzheimer's screening by repurposing a CLIP-based vision-language model through Cross-Layer Fusion Adapters (CLFA). CLFA progressively aligns visual features to handwriting-specific cues via multi-level, tokenwise depthwise fusion and residual retargeting, enabling prompt-free zero-shot inference across unseen handwriting tasks. In experiments on the DARWIN-RAW dataset, CLFA outperforms a MVFA baseline in cross-task AUC and uncovers structured task relationships that map visuospatial, motor, and language processes to AD signatures. These results offer a scalable framework and practical guidance for designing robust handwriting-based digital biomarkers with cross-task generalization across diverse handwriting tasks.

Abstract

Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.

Paper Structure

This paper contains 21 sections, 9 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Example trajectory images generated from online handwriting. Top row: healthy controls; bottom row: AD patients. We show representative samples from the three major task categories: memory/dictation (M: Task 14, Task 22), copying (C: Task 8, Task 9), and graphic drawing (G: Task 4, Task 24). While some handwriting impairments in AD subjects are relatively easy to diagnose (e.g., distorted or trembled strokes), others remain subtle and difficult to distinguish from healthy controls, reflecting the variability of early-stage AD detection.
  • Figure 2: Overview of the Cross-Layer Fusion Adapter (CLFA). Each selected ViT block hosts a lightweight adapter with depthwise 1D convolution and cross-layer fusion. Fused mid-level descriptors are compared to normal/abnormal CLIP text prototypes for zero-shot detection.
  • Figure 3: Cross-layer fusion Adapter.
  • Figure 4: Cross-task AUC matrix for zero-shot AD detection using CLFA. Rows denote training tasks; columns denote testing tasks. Higher off-diagonal values indicate stronger cross-task generalization.
  • Figure 5: Cross-task AUC matrix for zero-shot AD detection using MVFA. Rows denote training tasks; columns denote testing tasks. Higher off-diagonal values indicate stronger cross-task generalization.
  • ...and 5 more figures