Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations

Muath Alsuhaibani; Hiroko H. Dodge; Mohammad H. Mahoor

Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations

Muath Alsuhaibani, Hiroko H. Dodge, Mohammad H. Mahoor

TL;DR

Early, non-invasive detection of Mild Cognitive Impairment (MCI) in community settings is addressed by a two-stage pipeline that uses a $128$-dimensional latent facial representation from a CAE and a transformer-based temporal model. Temporal encoding is achieved with segments and sequences using positional embeddings $P = P_M + P_S + P_p$, operating on video frames downsampled to $10$ fps, and a four-layer transformer with a classification token. The best configuration yields about $88\%$ accuracy and $AUC\approx0.87$, outperforming non-temporal baselines and competitive with prior modality-based methods on the I-CONECT dataset. This approach demonstrates a scalable, non-invasive screening path for MCI in real-world home settings, with future work including automated video quality assessment and multimodal fusion with speech data.

Abstract

Early detection of Mild Cognitive Impairment (MCI) leads to early interventions to slow the progression from MCI into dementia. Deep Learning (DL) algorithms could help achieve early non-invasive, low-cost detection of MCI. This paper presents the detection of MCI in older adults using DL models based only on facial features extracted from video-recorded conversations at home. We used the data collected from the I-CONECT behavioral intervention study (NCT02871921), where several sessions of semi-structured interviews between socially isolated older individuals and interviewers were video recorded. We develop a framework that extracts spatial holistic facial features using a convolutional autoencoder and temporal information using transformers. Our proposed DL model was able to detect the I-CONECT study participants' cognitive conditions (MCI vs. those with normal cognition (NC)) using facial features. The segments and sequence information of the facial features improved the prediction performance compared with the non-temporal features. The detection accuracy using this combined method reached 88% whereas 84% is the accuracy without applying the segments and sequences information of the facial features within a video on a certain theme.

Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations

TL;DR

Early, non-invasive detection of Mild Cognitive Impairment (MCI) in community settings is addressed by a two-stage pipeline that uses a

-dimensional latent facial representation from a CAE and a transformer-based temporal model. Temporal encoding is achieved with segments and sequences using positional embeddings

, operating on video frames downsampled to

fps, and a four-layer transformer with a classification token. The best configuration yields about

accuracy and

, outperforming non-temporal baselines and competitive with prior modality-based methods on the I-CONECT dataset. This approach demonstrates a scalable, non-invasive screening path for MCI in real-world home settings, with future work including automated video quality assessment and multimodal fusion with speech data.

Abstract

Paper Structure (12 sections, 6 equations, 6 figures, 8 tables)

This paper contains 12 sections, 6 equations, 6 figures, 8 tables.

Introduction
Related Work
Materials and Methods
Dataset
Data preprocessing
Unsupervised learning
Temporal information
Experiments and Results
Implementation
Results
Ablation study
Conclusion

Figures (6)

Figure 1: The Prepossessing Steps of Participants' Videos.
Figure 2: Examples of participant's face and autoencoder reconstructed Ones.
Figure 3: The vector cosine similarities between several latent feature vectors.
Figure 4: The frame extraction from a video with segments and sequences labeling.
Figure 5: An example of a number of segments and sequences of participants with different cognitive conditions.
...and 1 more figures

Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations

TL;DR

Abstract

Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)