Table of Contents
Fetching ...

Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

Jinchao Li, Yuejiao Wang, Junan Li, Jiawen Kang, Bo Zheng, Ka Ho Wong, Brian Mak, Helene H. Fung, Jean Woo, Man-Wai Mak, Timothy Kwok, Vincent Mok, Xianmin Gong, Xixin Wu, Xunying Liu, Patrick C. M. Wong, Helen Meng

TL;DR

This work tackles early detection of neurocognitive disorders via visual stimulated narratives by focusing on macrostructural narrative properties that unfold over time. It introduces two dynamic approaches, Dynamic Topic Modeling for topic evolution and TITAN for cross modal text image alignment, evaluated on Cantonese CU-MARVEL-RABBIT and English ADReSS/ADReSSo corpora. Results show macrostructural features outperform microstructural ones, with TITAN achieving F1 0.7238 and AUC 0.812 on CU-MARVEL-RABBIT and strong performance on the English corpora, while DTM provides meaningful topic evolution metrics such as topic consistency and change rate. The work offers interpretable, cross lingual, and multi modal capabilities that can enhance non invasive screening and understanding of linguistic cognitive interactions in NCDs.

Abstract

Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., lexical diversity) that are closely tied to bottom-up, stimulus-driven cognitive processes. While these features illuminate basic language abilities, the higher-order linguistic macrostructures (e.g., topic development) that may reflect top-down, concept-driven cognitive abilities remain underexplored. These macrostructural patterns are crucial for NCD detection, yet challenging to quantify due to their abstract and complex nature. To bridge this gap, we propose two novel macrostructural approaches: (1) a Dynamic Topic Model (DTM) to track topic evolution over time, and (2) a Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between narrative and visual stimuli. Experimental results show the effectiveness of the proposed approaches in NCD detection, with TITAN achieving superior performance across three corpora: ADReSS (F1=0.8889), ADReSSo (F1=0.8504), and CU-MARVEL-RABBIT (F1=0.7238). Feature contribution analysis reveals that macrostructural features (e.g., topic variability, topic change rate, and topic consistency) constitute the most significant contributors to the model's decision pathways, outperforming the investigated microstructural features. These findings underscore the value of macrostructural analysis for understanding linguistic-cognitive interactions associated with NCDs.

Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

TL;DR

This work tackles early detection of neurocognitive disorders via visual stimulated narratives by focusing on macrostructural narrative properties that unfold over time. It introduces two dynamic approaches, Dynamic Topic Modeling for topic evolution and TITAN for cross modal text image alignment, evaluated on Cantonese CU-MARVEL-RABBIT and English ADReSS/ADReSSo corpora. Results show macrostructural features outperform microstructural ones, with TITAN achieving F1 0.7238 and AUC 0.812 on CU-MARVEL-RABBIT and strong performance on the English corpora, while DTM provides meaningful topic evolution metrics such as topic consistency and change rate. The work offers interpretable, cross lingual, and multi modal capabilities that can enhance non invasive screening and understanding of linguistic cognitive interactions in NCDs.

Abstract

Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., lexical diversity) that are closely tied to bottom-up, stimulus-driven cognitive processes. While these features illuminate basic language abilities, the higher-order linguistic macrostructures (e.g., topic development) that may reflect top-down, concept-driven cognitive abilities remain underexplored. These macrostructural patterns are crucial for NCD detection, yet challenging to quantify due to their abstract and complex nature. To bridge this gap, we propose two novel macrostructural approaches: (1) a Dynamic Topic Model (DTM) to track topic evolution over time, and (2) a Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between narrative and visual stimuli. Experimental results show the effectiveness of the proposed approaches in NCD detection, with TITAN achieving superior performance across three corpora: ADReSS (F1=0.8889), ADReSSo (F1=0.8504), and CU-MARVEL-RABBIT (F1=0.7238). Feature contribution analysis reveals that macrostructural features (e.g., topic variability, topic change rate, and topic consistency) constitute the most significant contributors to the model's decision pathways, outperforming the investigated microstructural features. These findings underscore the value of macrostructural analysis for understanding linguistic-cognitive interactions associated with NCDs.
Paper Structure (25 sections, 11 equations, 7 figures, 5 tables)

This paper contains 25 sections, 11 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The topic distribution for (a) the HC group and (b) the NCD group; and the topic evolution for (c) the HC group and (d) the NCD group; as obtained from DTM. Plots (a) and (b) show the weights of the top-10 topic words for each topic (A to E) for the group. Plots (c) and (d) display the average topic proportions with curves and standard deviations with shaded areas. Ordering of topics A to E is based on the positions of their first peaks in topic evolutions.
  • Figure 2: (a) The proposed TITAN model. (b) The Attention module of TITAN. (c) The correlation map between image and text embeddings in TITAN, where the 15 textual chunks are manually segmented and aligned with the corresponding images. TITAN takes images and text narratives as inputs to predict NCDs, with the Attention mechanism using RoPE to emphasize important positions.
  • Figure 3: Correlation coefficients between proposed features and NCD labels, with statistical significance shown on the x-axis. The heights of the green (macrostructural) and red (microstructural) bars represent correlation values, indicating that most top-ranked features are macrostructural.
  • Figure 4: Global SHAP value distribution (beeswarm plot) across all test samples in the CU-MARVEL-RABBIT corpus. Features are ranked vertically by mean absolute SHAP values (most impactful at the top). Each point represents an instance: the horizontal position indicates the SHAP value (negative or positive impact), the color reflects the standardized feature value (blue = low, red = high), and density of points along each feature indicates value distribution.
  • Figure 5: Attention mechanism in text-visual alignment. (a) Raw text-visual correlation differences between the NCD and HC groups. (b) Raw position-wise similarities of RoPE. Text-visual attention weights for (c) the HC group (c) and (d) the NCD group, with texts as the query and images as the key. For visualization purposes, the text timeline is resampled (via downsampling or interpolation) to match the number of images.
  • ...and 2 more figures