Table of Contents
Fetching ...

CogPic: A Multimodal Dataset for Early Cognitive Impairment Assessment via Picture Description Tasks

Liuyu Wu, Rui Feng, Jie Li, Wentao Xiang, Yi Zhang, Yin Cao, Siyang Song, Xiao Gu, Jianqing Li, Wei Wang

Abstract

The automated evaluation of cognitive status utilizing multimedia technologies presents a promising frontier in early dementia diagnosis. However, the development of robust machine learning models for cognitive impairment detection is frequently hindered by the scarcity of large-scale, strictly synchronized, and clinically validated multimodal datasets. To bridge this critical gap, we introduce the CogPic database, a comprehensive multimodal benchmark meticulously designed for fine-grained cognitive impairment detection. The dataset comprises strictly synchronized audio, visual, and linguistic data continuously collected from 574 participants during a naturalistic picture description task. To establish highly reliable diagnostic ground truth, expert clinical neuropsychologists conducted exhaustive evaluations, stratifying participants into distinct cognitive groups through a comprehensive clinical consensus. Consequently, CogPic stands as the largest, most modality-rich, and most meticulously evaluated dataset of its kind to date. By conducting extensive benchmark experiments on the CogPic dataset, we establish an exceptionally robust, unbiased, and clinically generalizable foundation to propel future multimedia research in automated cognitive health assessment. Detailed information and access application procedures for our CogPic database are available at https://cogpic.github.io/.

CogPic: A Multimodal Dataset for Early Cognitive Impairment Assessment via Picture Description Tasks

Abstract

The automated evaluation of cognitive status utilizing multimedia technologies presents a promising frontier in early dementia diagnosis. However, the development of robust machine learning models for cognitive impairment detection is frequently hindered by the scarcity of large-scale, strictly synchronized, and clinically validated multimodal datasets. To bridge this critical gap, we introduce the CogPic database, a comprehensive multimodal benchmark meticulously designed for fine-grained cognitive impairment detection. The dataset comprises strictly synchronized audio, visual, and linguistic data continuously collected from 574 participants during a naturalistic picture description task. To establish highly reliable diagnostic ground truth, expert clinical neuropsychologists conducted exhaustive evaluations, stratifying participants into distinct cognitive groups through a comprehensive clinical consensus. Consequently, CogPic stands as the largest, most modality-rich, and most meticulously evaluated dataset of its kind to date. By conducting extensive benchmark experiments on the CogPic dataset, we establish an exceptionally robust, unbiased, and clinically generalizable foundation to propel future multimedia research in automated cognitive health assessment. Detailed information and access application procedures for our CogPic database are available at https://cogpic.github.io/.

Paper Structure

This paper contains 14 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The overall pipeline of CogPic dataset construction, consisting of four stages: 1). Strict inclusion screening; 2). Task introduction; 3). Synchronized multimodal data collection; and 4). Expert clinical assessment for diagnostic consensus.
  • Figure 2: Characteristics of the CogPic dataset: (a) demographic distributions across cognitive cohorts, gender, and education levels; (b) task-specific mean response durations; and (c) high-frequency linguistic tokens visualized via word cloud.
  • Figure 3: Interpretability analysis via SHAP global feature attribution. The beeswarm plot ranks the top 20 most influential cross-modal handcrafted features for AD prediction using the optimal XGBoost classifier.