Table of Contents
Fetching ...

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

Vasudha Varadarajan, Hui Xu, Rebecca Astrid Boehme, Mariam Marlan Mirstrom, Sverker Sikstrom, H. Andrew Schwartz

TL;DR

MAQuA advances mental health screening by integrating multidimensional IRT with language-based responses and factor analysis to drive adaptive, information-maximizing questioning. It demonstrates that joint multi-task modeling across ten conditions, coupled with $D$-optimality based item selection and discretization for MIRT, yields accurate cross-diagnostic scores while substantially reducing the question burden ( up to $50$–$85\%$ ). The approach showed robust performance across internalizing and externalizing domains and offers a path toward efficient, interactive screening within real-world clinical workflows using LLM-based agents. The work also provides a new questionnaire-driven dataset to support future research and aims to extend validation to diverse populations and real-time settings.

Abstract

Recent advances in large language models (LLMs) offer new opportunities for scalable, interactive mental health assessment, but excessive querying by LLMs burdens users and is inefficient for real-world screening across transdiagnostic symptom profiles. We introduce MAQuA, an adaptive question-asking framework for simultaneous, multidimensional mental health screening. Combining multi-outcome modeling on language responses with item response theory (IRT) and factor analysis, MAQuA selects the questions with most informative responses across multiple dimensions at each turn to optimize diagnostic information, improving accuracy and potentially reducing response burden. Empirical results on a novel dataset reveal that MAQuA reduces the number of assessment questions required for score stabilization by 50-87% compared to random ordering (e.g., achieving stable depression scores with 71% fewer questions and eating disorder scores with 85% fewer questions). MAQuA demonstrates robust performance across both internalizing (depression, anxiety) and externalizing (substance use, eating disorder) domains, with early stopping strategies further reducing patient time and burden. These findings position MAQuA as a powerful and efficient tool for scalable, nuanced, and interactive mental health screening, advancing the integration of LLM-based agents into real-world clinical workflows.

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

TL;DR

MAQuA advances mental health screening by integrating multidimensional IRT with language-based responses and factor analysis to drive adaptive, information-maximizing questioning. It demonstrates that joint multi-task modeling across ten conditions, coupled with -optimality based item selection and discretization for MIRT, yields accurate cross-diagnostic scores while substantially reducing the question burden ( up to ). The approach showed robust performance across internalizing and externalizing domains and offers a path toward efficient, interactive screening within real-world clinical workflows using LLM-based agents. The work also provides a new questionnaire-driven dataset to support future research and aims to extend validation to diverse populations and real-time settings.

Abstract

Recent advances in large language models (LLMs) offer new opportunities for scalable, interactive mental health assessment, but excessive querying by LLMs burdens users and is inefficient for real-world screening across transdiagnostic symptom profiles. We introduce MAQuA, an adaptive question-asking framework for simultaneous, multidimensional mental health screening. Combining multi-outcome modeling on language responses with item response theory (IRT) and factor analysis, MAQuA selects the questions with most informative responses across multiple dimensions at each turn to optimize diagnostic information, improving accuracy and potentially reducing response burden. Empirical results on a novel dataset reveal that MAQuA reduces the number of assessment questions required for score stabilization by 50-87% compared to random ordering (e.g., achieving stable depression scores with 71% fewer questions and eating disorder scores with 85% fewer questions). MAQuA demonstrates robust performance across both internalizing (depression, anxiety) and externalizing (substance use, eating disorder) domains, with early stopping strategies further reducing patient time and burden. These findings position MAQuA as a powerful and efficient tool for scalable, nuanced, and interactive mental health screening, advancing the integration of LLM-based agents into real-world clinical workflows.

Paper Structure

This paper contains 21 sections, 5 figures, 10 tables, 2 algorithms.

Figures (5)

  • Figure 1: Single-task models are set up to predict a mental health condition score based on language responses to the general questions as well as the condition-specific questions. Multi-task models have been set up to take in all the language responses and predict all the mental health scores simultaneously.
  • Figure 2: Pearson correlations of MAQuA-estimated scores over the number of questions asked along with their rolling standard deviation of the correlations. The vertical line shows the stability of the estimation based on a threshold for the standard deviation. Our adaptive method consistently stabilizes in at most 50% the number of questions as random. While GPT-4 shows promise in its estimation capabilities of factor 2 (externalizing disorders) - substance use, alcohol use, eating disorder. For more details please see §\ref{['app:stabilization_points']} .
  • Figure 3: Cohen's d against the reported diagnoses for our best multitask model against the validated clinical scores (considered ground truth in the modeling). * indicates one being significantly better correlated to diagnosis than the other.
  • Figure A.1: The number of participants in the dataset that reported diagnosis for each of the conditions.
  • Figure A.2: Question texts loading on to the two factors.