Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez, Manuel Benavent-Lledo, Jose Garcia-Rodriguez, David Tomás, M. Flores Vizcaya-Moreno
TL;DR
The paper tackles non-intrusive cognitive decline detection using deep learning by surveying audio, text, and visual modalities and their multimodal integrations. It presents a PRISMA-guided methodology, catalogs datasets, and builds a taxonomy of unimodal and multimodal approaches with performance insights across tasks like Alzheimer’s and MCI detection. Key findings show that text-based signals often provide the strongest cues, while multimodal fusion generally improves performance, though data scarcity and privacy issues limit real-world adoption. The work highlights ethical, fairness, and deployment considerations, and outlines future directions toward scalable, multilingual, and interpretable models with richer datasets. Overall, it provides a comprehensive roadmap for researchers and clinicians aiming to develop robust non-invasive cognitive assessment tools.
Abstract
Cognitive decline is a natural part of aging. However, under some circumstances, this decline is more pronounced than expected, typically due to disorders such as Alzheimer's disease. Early detection of an anomalous decline is crucial, as it can facilitate timely professional intervention. While medical data can help, it often involves invasive procedures. An alternative approach is to employ non-intrusive techniques such as speech or handwriting analysis, which do not disturb daily activities. This survey reviews the most relevant non-intrusive methodologies that use deep learning techniques to automate the cognitive decline detection task, including audio, text, and visual processing. We discuss the key features and advantages of each modality and methodology, including state-of-the-art approaches like Transformer architecture and foundation models. In addition, we present studies that integrate different modalities to develop multimodal models. We also highlight the most significant datasets and the quantitative results from studies using these resources. From this review, several conclusions emerge. In most cases, text-based approaches consistently outperform other modalities. Furthermore, combining various approaches from individual modalities into a multimodal model consistently enhances performance across nearly all scenarios.
