Table of Contents
Fetching ...

A Multimodal Approach to Alzheimer's Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments

Jaeho Yang, Kijung Yoon

TL;DR

This work introduces a multimodal framework that converts hand-drawn cube sketches into graph-structured representations and fuses these with demographic and neuropsychological data to detect Alzheimer's disease. The graph-based approach, particularly using a Graph Attention Network on cube graphs, outperforms pixel-based baselines and gains robustness through late fusion with age, education, and NPT scores. SHAP analyses reveal that specific graphlets and geometric distortions are the strongest predictors, aligning with clinical observations of visuospatial impairment in AD. The method offers a scalable, interpretable, non-invasive screening tool suitable for broader deployment and community screening, with future work extending to multi-class and temporal analyses.

Abstract

Early and accessible detection of Alzheimer's disease (AD) remains a critical clinical challenge, and cube-copying tasks offer a simple yet informative assessment of visuospatial function. This work proposes a multimodal framework that converts hand-drawn cube sketches into graph-structured representations capturing geometric and topological properties, and integrates these features with demographic information and neuropsychological test (NPT) scores for AD classification. Cube drawings are modeled as graphs with node features encoding spatial coordinates, local graphlet-based topology, and angular geometry, which are processed using graph neural networks and fused with age, education, and NPT features in a late-fusion model. Experimental results show that graph-based representations provide a strong unimodal baseline and substantially outperform pixel-based convolutional models, while multimodal integration further improves performance and robustness to class imbalance. SHAP-based interpretability analysis identifies specific graphlet motifs and geometric distortions as key predictors, closely aligning with clinical observations of disorganized cube drawings in AD. Together, these results establish graph-based analysis of cube copying as an interpretable, non-invasive, and scalable approach for Alzheimer's disease screening.

A Multimodal Approach to Alzheimer's Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments

TL;DR

This work introduces a multimodal framework that converts hand-drawn cube sketches into graph-structured representations and fuses these with demographic and neuropsychological data to detect Alzheimer's disease. The graph-based approach, particularly using a Graph Attention Network on cube graphs, outperforms pixel-based baselines and gains robustness through late fusion with age, education, and NPT scores. SHAP analyses reveal that specific graphlets and geometric distortions are the strongest predictors, aligning with clinical observations of visuospatial impairment in AD. The method offers a scalable, interpretable, non-invasive screening tool suitable for broader deployment and community screening, with future work extending to multi-class and temporal analyses.

Abstract

Early and accessible detection of Alzheimer's disease (AD) remains a critical clinical challenge, and cube-copying tasks offer a simple yet informative assessment of visuospatial function. This work proposes a multimodal framework that converts hand-drawn cube sketches into graph-structured representations capturing geometric and topological properties, and integrates these features with demographic information and neuropsychological test (NPT) scores for AD classification. Cube drawings are modeled as graphs with node features encoding spatial coordinates, local graphlet-based topology, and angular geometry, which are processed using graph neural networks and fused with age, education, and NPT features in a late-fusion model. Experimental results show that graph-based representations provide a strong unimodal baseline and substantially outperform pixel-based convolutional models, while multimodal integration further improves performance and robustness to class imbalance. SHAP-based interpretability analysis identifies specific graphlet motifs and geometric distortions as key predictors, closely aligning with clinical observations of disorganized cube drawings in AD. Together, these results establish graph-based analysis of cube copying as an interpretable, non-invasive, and scalable approach for Alzheimer's disease screening.

Paper Structure

This paper contains 20 sections, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Pipeline for transforming hand-drawn cube images into graph-structured representations. (a) Raw cube drawings collected from participants. (b) Binarized images obtained by adaptive thresholding and noise filtering. (c) Vectorized line representations produced from skeletonized strokes. (d) Line simplification and node extraction, showing intermediate polylines with candidate junctions. (e) Final graph representations after merging nearby nodes and pruning spurious ones, where nodes correspond to cube corners and edges correspond to drawn line segments.
  • Figure 2: Multimodal classification framework integrating cube drawing graphs and clinical features. The cube image is first converted into a cube graph using Algorithm \ref{['alg:graph_pipeline']} and processed by a graph neural network GNN implemented via GraphGym to extract a graph-level representation. Demographic variables (age and education, one-hot encoded) and neuropsychological test scores are independently encoded using modality-specific MLPs. The resulting embeddings from all modalities are concatenated and passed to a final MLP classifier to predict diagnostic status (Normal vs. AD).
  • Figure 3: The bar plot shows the top 10 features ranked by mean absolute SHAP values, reflecting their overall contribution to the model’s predictions. Graph-derived features, particularly graphlet 6 and graphlet 4, exhibit the highest importance, highlighting the central role of cube-structural motifs in AD classification. Age-related variables (e.g., 75–79, 60–64, 80–84) also contribute substantially, indicating strong demographic effects, while education levels provide additional complementary information. NPT features are not shown, as they fall outside the top-ranked features.
  • Figure 4: (a) The first 15 graphlets, including 2-, 3-, and 4-node induced subgraphs, with node indices indicating graphlet orbits that specify the structural role of each node within an isomorphism class. (b) Three representative cube drawings produced by AD participants. (c) The corresponding cube graph representations extracted from the drawings, where vertices (red dots) denote junctions or endpoints and edges represent connecting strokes. In panel (a), black and white nodes distinguish different orbit roles within each graphlet, highlighting structurally distinct node positions that collectively characterize the topological signature of the drawing graphs.