Table of Contents
Fetching ...

Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis

David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás

TL;DR

The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews, and extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results.

Abstract

Cognitive decline is a natural process that occurs as individuals age. Early diagnosis of anomalous decline is crucial for initiating professional treatment that can enhance the quality of life of those affected. To address this issue, we propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The TAUKADIAL dataset is used to conduct the evaluation, which comprises audio recordings of clinical interviews. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Subsequently, the model extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results. Our approach involves in-depth research to implement various features obtained from the proposed modalities.

Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis

TL;DR

The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews, and extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results.

Abstract

Cognitive decline is a natural process that occurs as individuals age. Early diagnosis of anomalous decline is crucial for initiating professional treatment that can enhance the quality of life of those affected. To address this issue, we propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The TAUKADIAL dataset is used to conduct the evaluation, which comprises audio recordings of clinical interviews. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Subsequently, the model extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results. Our approach involves in-depth research to implement various features obtained from the proposed modalities.
Paper Structure (16 sections, 4 equations, 1 figure, 2 tables)

This paper contains 16 sections, 4 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Overview of the multimodal architecture.