Table of Contents
Fetching ...

Personality-Enhanced Multimodal Depression Detection in the Elderly

Honghong Wang, Jing Deng, Rong Zheng

TL;DR

This work tackles depression detection in the elderly by introducing a personality-aware multimodal framework that fuses audio and visual cues. The audio stream leverages multi-level features (LLDs, MFCCs, Wav2Vec) fused through a co-attention mechanism, while the video stream aggregates OpenFace, DenseNet, and ResNet features, with utterance-level representations produced via ASP and Transformer Fusion. A novel Personality Traits and Multimodal Feature Interaction Module (PTMFIM) models deep correlations between Big5 traits and multimodal representations through Binary Correlation Attention, Triple Interaction Attention, and a Gate Regulator, using a demographic-driven textual embedding to personalize the analysis. Evaluated on the MPDD Elderly dataset with 1s and 5s windows, the approach yields significant improvements over baselines across binary, ternary, and quinary depression classifications, highlighting the impact of personality-aware personalization in multimodal depression detection. The findings support the potential of personalized, multimodal screening approaches for more accurate and age-appropriate mental health assessment in clinical and community settings.

Abstract

This paper presents our solution to the Multimodal Personality-aware Depression Detection (MPDD) challenge at ACM MM 2025. We propose a multimodal depression detection model in the Elderly that incorporates personality characteristics. We introduce a multi-feature fusion approach based on a co-attention mechanism to effectively integrate LLDs, MFCCs, and Wav2Vec features in the audio modality. For the video modality, we combine representations extracted from OpenFace, ResNet, and DenseNet to construct a comprehensive visual feature set. Recognizing the critical role of personality in depression detection, we design an interaction module that captures the relationships between personality traits and multimodal features. Experimental results from the MPDD Elderly Depression Detection track demonstrate that our method significantly enhances performance, providing valuable insights for future research in multimodal depression detection among elderly populations.

Personality-Enhanced Multimodal Depression Detection in the Elderly

TL;DR

This work tackles depression detection in the elderly by introducing a personality-aware multimodal framework that fuses audio and visual cues. The audio stream leverages multi-level features (LLDs, MFCCs, Wav2Vec) fused through a co-attention mechanism, while the video stream aggregates OpenFace, DenseNet, and ResNet features, with utterance-level representations produced via ASP and Transformer Fusion. A novel Personality Traits and Multimodal Feature Interaction Module (PTMFIM) models deep correlations between Big5 traits and multimodal representations through Binary Correlation Attention, Triple Interaction Attention, and a Gate Regulator, using a demographic-driven textual embedding to personalize the analysis. Evaluated on the MPDD Elderly dataset with 1s and 5s windows, the approach yields significant improvements over baselines across binary, ternary, and quinary depression classifications, highlighting the impact of personality-aware personalization in multimodal depression detection. The findings support the potential of personalized, multimodal screening approaches for more accurate and age-appropriate mental health assessment in clinical and community settings.

Abstract

This paper presents our solution to the Multimodal Personality-aware Depression Detection (MPDD) challenge at ACM MM 2025. We propose a multimodal depression detection model in the Elderly that incorporates personality characteristics. We introduce a multi-feature fusion approach based on a co-attention mechanism to effectively integrate LLDs, MFCCs, and Wav2Vec features in the audio modality. For the video modality, we combine representations extracted from OpenFace, ResNet, and DenseNet to construct a comprehensive visual feature set. Recognizing the critical role of personality in depression detection, we design an interaction module that captures the relationships between personality traits and multimodal features. Experimental results from the MPDD Elderly Depression Detection track demonstrate that our method significantly enhances performance, providing valuable insights for future research in multimodal depression detection among elderly populations.

Paper Structure

This paper contains 9 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of our multimodal depression detection framework integrating personality traits.
  • Figure 2: Multimodal feature and personality traits iteration module.