Table of Contents
Fetching ...

Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health

Yongquan Hu, Shuning Zhang, Ting Dang, Hong Jia, Flora D. Salim, Wen Hu, Aaron J. Quigley

TL;DR

The paper investigates leveraging large language models to evaluate multimodal data for mental health, focusing on EEG alongside audio and facial cues. It introduces MultiEEG-GPT, employs zero-shot and 1-shot prompts with GPT-4o, and evaluates on MODMA, PME4, and LUMED-2 datasets. Results show that multimodal integration, including EEG, improves depression and emotion recognition over single modalities, with 1-shot prompting outperforming zero-shot. The work highlights practical potential and ethical considerations for deploying LLM-based health agents in ubiquitous computing and affective systems, while suggesting avenues for further improvement through instruction tuning and multi-strategy prediction.

Abstract

Integrating physiological signals such as electroencephalogram (EEG), with other data such as interview audio, may offer valuable multimodal insights into psychological states or neurological disorders. Recent advancements with Large Language Models (LLMs) position them as prospective ``health agents'' for mental health assessment. However, current research predominantly focus on single data modalities, presenting an opportunity to advance understanding through multimodal data. Our study aims to advance this approach by investigating multimodal data using LLMs for mental health assessment, specifically through zero-shot and few-shot prompting. Three datasets are adopted for depression and emotion classifications incorporating EEG, facial expressions, and audio (text). The results indicate that multimodal information confers substantial advantages over single modality approaches in mental health assessment. Notably, integrating EEG alongside commonly used LLM modalities such as audio and images demonstrates promising potential. Moreover, our findings reveal that 1-shot learning offers greater benefits compared to zero-shot learning methods.

Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health

TL;DR

The paper investigates leveraging large language models to evaluate multimodal data for mental health, focusing on EEG alongside audio and facial cues. It introduces MultiEEG-GPT, employs zero-shot and 1-shot prompts with GPT-4o, and evaluates on MODMA, PME4, and LUMED-2 datasets. Results show that multimodal integration, including EEG, improves depression and emotion recognition over single modalities, with 1-shot prompting outperforming zero-shot. The work highlights practical potential and ethical considerations for deploying LLM-based health agents in ubiquitous computing and affective systems, while suggesting avenues for further improvement through instruction tuning and multi-strategy prediction.

Abstract

Integrating physiological signals such as electroencephalogram (EEG), with other data such as interview audio, may offer valuable multimodal insights into psychological states or neurological disorders. Recent advancements with Large Language Models (LLMs) position them as prospective ``health agents'' for mental health assessment. However, current research predominantly focus on single data modalities, presenting an opportunity to advance understanding through multimodal data. Our study aims to advance this approach by investigating multimodal data using LLMs for mental health assessment, specifically through zero-shot and few-shot prompting. Three datasets are adopted for depression and emotion classifications incorporating EEG, facial expressions, and audio (text). The results indicate that multimodal information confers substantial advantages over single modality approaches in mental health assessment. Notably, integrating EEG alongside commonly used LLM modalities such as audio and images demonstrates promising potential. Moreover, our findings reveal that 1-shot learning offers greater benefits compared to zero-shot learning methods.
Paper Structure (13 sections, 1 figure, 2 tables)

This paper contains 13 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Case analysis for LUMED-2 and PME4 datasets (the person's face has been blurred for ethical reasons). Figure (a) illustrates one subject's input EEG topology map and his facial expression, as well as the prediction result and the text explanation from LUMED-2 dataset. Figure (b) illustrates one subject's input EEG topology map, audio features, input audio transcription "The sky is green.", as well as the prediction result and the explanation, from PME4 dataset. In both cases, the model makes the accurate predictions when processing both modalities.