Table of Contents
Fetching ...

MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

Kailai Yang, Tianlin Zhang, Ziyan Kuang, Qianqian Xie, Jimin Huang, Sophia Ananiadou

TL;DR

This paper formalizes interpretable mental health analysis on social media as a generation task and introduces the IMHI multi-task, multi-source dataset (105K samples) built with expert instructions and ChatGPT-generated explanations. It then presents MentaLLaMA, an open-source LLM series based on LLaMA2 that is instruction-tuned on IMHI to produce predictions and explanations, and evaluates it on a holistic IMHI benchmark with 19K test samples across 8 tasks. Results show MentaLLaMA matching or approaching state-of-the-art discriminative methods in correctness and delivering high-quality explanations comparable to ChatGPT, with strong generalization to unseen tasks. The work highlights the potential of open-source, instruction-following LLMs for interpretable mental health analysis while noting limitations in professionality and the need for continual pretraining and better automatic evaluation metrics.

Abstract

With the development of web technology, social media texts are becoming a rich source for automatic mental health analysis. As traditional discriminative methods bear the problem of low interpretability, the recent large language models have been explored for interpretable mental health analysis on social media, which aims to provide detailed explanations along with predictions. The results show that ChatGPT can generate approaching-human explanations for its correct classifications. However, LLMs still achieve unsatisfactory classification performance in a zero-shot/few-shot manner. Domain-specific finetuning is an effective solution, but faces 2 challenges: 1) lack of high-quality training data. 2) no open-source LLMs for interpretable mental health analysis were released to lower the finetuning cost. To alleviate these problems, we build the first multi-task and multi-source interpretable mental health instruction (IMHI) dataset on social media, with 105K data samples. The raw social media data are collected from 10 existing sources covering 8 mental health analysis tasks. We use expert-written few-shot prompts and collected labels to prompt ChatGPT and obtain explanations from its responses. To ensure the reliability of the explanations, we perform strict automatic and human evaluations on the correctness, consistency, and quality of generated data. Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA, the first open-source LLM series for interpretable mental health analysis with instruction-following capability. We also evaluate the performance of MentalLLaMA on the IMHI evaluation benchmark with 10 test sets, where their correctness for making predictions and the quality of explanations are examined. The results show that MentalLLaMA approaches state-of-the-art discriminative methods in correctness and generates high-quality explanations.

MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

TL;DR

This paper formalizes interpretable mental health analysis on social media as a generation task and introduces the IMHI multi-task, multi-source dataset (105K samples) built with expert instructions and ChatGPT-generated explanations. It then presents MentaLLaMA, an open-source LLM series based on LLaMA2 that is instruction-tuned on IMHI to produce predictions and explanations, and evaluates it on a holistic IMHI benchmark with 19K test samples across 8 tasks. Results show MentaLLaMA matching or approaching state-of-the-art discriminative methods in correctness and delivering high-quality explanations comparable to ChatGPT, with strong generalization to unseen tasks. The work highlights the potential of open-source, instruction-following LLMs for interpretable mental health analysis while noting limitations in professionality and the need for continual pretraining and better automatic evaluation metrics.

Abstract

With the development of web technology, social media texts are becoming a rich source for automatic mental health analysis. As traditional discriminative methods bear the problem of low interpretability, the recent large language models have been explored for interpretable mental health analysis on social media, which aims to provide detailed explanations along with predictions. The results show that ChatGPT can generate approaching-human explanations for its correct classifications. However, LLMs still achieve unsatisfactory classification performance in a zero-shot/few-shot manner. Domain-specific finetuning is an effective solution, but faces 2 challenges: 1) lack of high-quality training data. 2) no open-source LLMs for interpretable mental health analysis were released to lower the finetuning cost. To alleviate these problems, we build the first multi-task and multi-source interpretable mental health instruction (IMHI) dataset on social media, with 105K data samples. The raw social media data are collected from 10 existing sources covering 8 mental health analysis tasks. We use expert-written few-shot prompts and collected labels to prompt ChatGPT and obtain explanations from its responses. To ensure the reliability of the explanations, we perform strict automatic and human evaluations on the correctness, consistency, and quality of generated data. Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA, the first open-source LLM series for interpretable mental health analysis with instruction-following capability. We also evaluate the performance of MentalLLaMA on the IMHI evaluation benchmark with 10 test sets, where their correctness for making predictions and the quality of explanations are examined. The results show that MentalLLaMA approaches state-of-the-art discriminative methods in correctness and generates high-quality explanations.
Paper Structure (27 sections, 2 equations, 6 figures, 6 tables)

This paper contains 27 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Some examples of MentaLLaMA's capabilities in diverse mental health analysis tasks.
  • Figure 2: Three components are concatenated to construct the prompts. The key information is marked in blue.
  • Figure 3: Automatic evaluation results on ChatGPT-generated data.
  • Figure 4: Distributions of human evaluation scores on ChatGPT-generated explanations. Orange lines and green dots denote the median and average numbers.
  • Figure 5: BART-score evaluation results on the IMHI test set and expert-written gold set.
  • ...and 1 more figures