Table of Contents
Fetching ...

MentalQA: An Annotated Arabic Corpus for Questions and Answers of Mental Healthcare

Hassan Alhuzali, Ashwag Alasmari, Hamad Alsaleh

TL;DR

MentalQA addresses the scarcity of Arabic mental health resources by building an annotated Arabic QA corpus derived from an online medical platform. The authors define a six-category question taxonomy and three answer-strategy classes, validated through a multi-annotator workflow with substantial agreement. They propose three coupled tasks—question classification, answer strategy classification, and a conversational-style QA framework—and present extensive analyses of demographics, sentiment, word usage, and answering behavior to demonstrate the dataset's utility for Arabic NLP in mental health. The resource enables development of Arabic clinical QA tools, patient-facing chatbots, and training data for LLMs, with plans to expand coverage and further validate across tasks.

Abstract

Mental health disorders significantly impact people globally, regardless of background, education, or socioeconomic status. However, access to adequate care remains a challenge, particularly for underserved communities with limited resources. Text mining tools offer immense potential to support mental healthcare by assisting professionals in diagnosing and treating patients. This study addresses the scarcity of Arabic mental health resources for developing such tools. We introduce MentalQA, a novel Arabic dataset featuring conversational-style question-and-answer (QA) interactions. To ensure data quality, we conducted a rigorous annotation process using a well-defined schema with quality control measures. Data was collected from a question-answering medical platform. The annotation schema for mental health questions and corresponding answers draws upon existing classification schemes with some modifications. Question types encompass six distinct categories: diagnosis, treatment, anatomy \& physiology, epidemiology, healthy lifestyle, and provider choice. Answer strategies include information provision, direct guidance, and emotional support. Three experienced annotators collaboratively annotated the data to ensure consistency. Our findings demonstrate high inter-annotator agreement, with Fleiss' Kappa of $0.61$ for question types and $0.98$ for answer strategies. In-depth analysis revealed insightful patterns, including variations in question preferences across age groups and a strong correlation between question types and answer strategies. MentalQA offers a valuable foundation for developing Arabic text mining tools capable of supporting mental health professionals and individuals seeking information.

MentalQA: An Annotated Arabic Corpus for Questions and Answers of Mental Healthcare

TL;DR

MentalQA addresses the scarcity of Arabic mental health resources by building an annotated Arabic QA corpus derived from an online medical platform. The authors define a six-category question taxonomy and three answer-strategy classes, validated through a multi-annotator workflow with substantial agreement. They propose three coupled tasks—question classification, answer strategy classification, and a conversational-style QA framework—and present extensive analyses of demographics, sentiment, word usage, and answering behavior to demonstrate the dataset's utility for Arabic NLP in mental health. The resource enables development of Arabic clinical QA tools, patient-facing chatbots, and training data for LLMs, with plans to expand coverage and further validate across tasks.

Abstract

Mental health disorders significantly impact people globally, regardless of background, education, or socioeconomic status. However, access to adequate care remains a challenge, particularly for underserved communities with limited resources. Text mining tools offer immense potential to support mental healthcare by assisting professionals in diagnosing and treating patients. This study addresses the scarcity of Arabic mental health resources for developing such tools. We introduce MentalQA, a novel Arabic dataset featuring conversational-style question-and-answer (QA) interactions. To ensure data quality, we conducted a rigorous annotation process using a well-defined schema with quality control measures. Data was collected from a question-answering medical platform. The annotation schema for mental health questions and corresponding answers draws upon existing classification schemes with some modifications. Question types encompass six distinct categories: diagnosis, treatment, anatomy \& physiology, epidemiology, healthy lifestyle, and provider choice. Answer strategies include information provision, direct guidance, and emotional support. Three experienced annotators collaboratively annotated the data to ensure consistency. Our findings demonstrate high inter-annotator agreement, with Fleiss' Kappa of for question types and for answer strategies. In-depth analysis revealed insightful patterns, including variations in question preferences across age groups and a strong correlation between question types and answer strategies. MentalQA offers a valuable foundation for developing Arabic text mining tools capable of supporting mental health professionals and individuals seeking information.
Paper Structure (19 sections, 7 figures, 5 tables)

This paper contains 19 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An overview of the creation of Arabic MentalQA dataset, starting from data collection, followed by detailed data annotation and the definition of tasks.
  • Figure 2: Example of two annotated Q&A posts, with each Q&A post translated into English for better readability. The first row represents the questions, while the second row represents the corresponding answers. Additionally, the categories for each question and answer are included.
  • Figure 3: Relationship between Q types and A strategies.
  • Figure 4: The most asked question types by patients’ gender.
  • Figure 5: The most asked question types by patients’ age.
  • ...and 2 more figures