Table of Contents
Fetching ...

Differential Mental Disorder Detection with Psychology-Inspired Multimodal Stimuli

Zhiyuan Zhou, Jingjing Wu, Zhibo Lei, Junyu Guo, Zhongcheng Yu, Yuqi Chu, Xiaowei Zhang, Qiqi Zhao, Qi Wang, Shijie Hao, Yanrong Guo, Richang Hong

Abstract

Differential diagnosis of mental disorders remains a fundamental challenge in real-world clinical practice, where multiple conditions often exhibit overlapping symptoms. However, most existing public datasets are developed under single-disorder settings and rely on limited data elicitation paradigms, restricting their ability to capture disorder-specific patterns. In this work, we investigate differential mental disorder detection through psychology-inspired multimodal stimuli, designed to elicit diverse emotional, cognitive, and behavioral responses grounded in findings from experimental psychology. Based on this paradigm, we collect a large-scale multimodal mental health dataset (MMH) covering depression, anxiety, and schizophrenia, with all diagnostic labels clinically verified by licensed psychiatrists. To effectively model the heterogeneous signals induced by diverse elicitation tasks, we further propose a paradigm-aware multimodal framework that leverages inter-disorder differences prior knowledge as prompt-guided semantic descriptions to capture task-specific affective and interaction contexts for multimodal representation learning in the new differential mental disorder detection task. Extensive experiments show that our framework consistently outperforms existing baselines, underscoring the value of psychology-inspired stimulus design for differential mental disorder detection.

Differential Mental Disorder Detection with Psychology-Inspired Multimodal Stimuli

Abstract

Differential diagnosis of mental disorders remains a fundamental challenge in real-world clinical practice, where multiple conditions often exhibit overlapping symptoms. However, most existing public datasets are developed under single-disorder settings and rely on limited data elicitation paradigms, restricting their ability to capture disorder-specific patterns. In this work, we investigate differential mental disorder detection through psychology-inspired multimodal stimuli, designed to elicit diverse emotional, cognitive, and behavioral responses grounded in findings from experimental psychology. Based on this paradigm, we collect a large-scale multimodal mental health dataset (MMH) covering depression, anxiety, and schizophrenia, with all diagnostic labels clinically verified by licensed psychiatrists. To effectively model the heterogeneous signals induced by diverse elicitation tasks, we further propose a paradigm-aware multimodal framework that leverages inter-disorder differences prior knowledge as prompt-guided semantic descriptions to capture task-specific affective and interaction contexts for multimodal representation learning in the new differential mental disorder detection task. Extensive experiments show that our framework consistently outperforms existing baselines, underscoring the value of psychology-inspired stimulus design for differential mental disorder detection.

Paper Structure

This paper contains 21 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison between conventional single-disorder assessment based on interview- or reading-based paradigms and our multi-disorder differential diagnosis setting built on a psychology-inspired multimodal elicitation paradigm, together with the proposed paradigm-aware multimodal learning framework. MD, ANX, SC and HC refer to depression, anxiety, schizophrenia and healthy controls, respectively.
  • Figure 2: Overview of the psychology-inspired multimodal stimulus paradigm.
  • Figure 3: The overview of the proposed paradigm-level prompt-guided learning framework, taking the four-class downstream task as an example. Stage 1 pretrains a video feature extractor by aligning visual cues with paradigm-level semantic descriptions generated by a MLLM. Stage 2 integrates these paradigm-aware visual features with complementary audio and text modalities through a cross-modality interaction module for multi-disorder detection.
  • Figure 4: Modality-level ablation study.
  • Figure 5: Module-level ablation study on the paradigm-aware multimodal learning framework. (PT: Pretraining; CA: Cross-Attention; CL: Contrastive Learning) 'w/o' denotes 'without'.
  • ...and 1 more figures