Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation
Changqing Gong, Huafeng Qin, Mounim A. El-Yacoubi
TL;DR
This work addresses cross-task generalization in handwriting-based Alzheimer's screening by repurposing a CLIP-based vision-language model through Cross-Layer Fusion Adapters (CLFA). CLFA progressively aligns visual features to handwriting-specific cues via multi-level, tokenwise depthwise fusion and residual retargeting, enabling prompt-free zero-shot inference across unseen handwriting tasks. In experiments on the DARWIN-RAW dataset, CLFA outperforms a MVFA baseline in cross-task AUC and uncovers structured task relationships that map visuospatial, motor, and language processes to AD signatures. These results offer a scalable framework and practical guidance for designing robust handwriting-based digital biomarkers with cross-task generalization across diverse handwriting tasks.
Abstract
Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.
