Table of Contents
Fetching ...

ALBA: Adaptive Language-based Assessments for Mental Health

Vasudha Varadarajan, Sverker Sikström, Oscar N. E. Kjell, H. Andrew Schwartz

TL;DR

This work introduces Adaptive Language-Based Assessment (ALBA), a framework for adaptively selecting language-based prompts to assess mental health with few responses. It develops ALIRT, a semi-supervised, polytomous IRT approach, and an Actor-Critic model to dynamically order questions and score a latent depression/anxiety trait from limited language data. Across PHQ-9 and GAD-7 benchmarks, ALIRT yields the highest accuracy with far fewer questions (e.g., $r \approx 0.93$ after 3 items) and generally outperforms fixed baselines with lower computational cost. The findings show adaptive language-based assessments can maintain validity while reducing the linguistic burden, with practical implications for conversational diagnostic agents and scalable mental health screening.

Abstract

Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment ALBA, which involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method ALIRT and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ~ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.

ALBA: Adaptive Language-based Assessments for Mental Health

TL;DR

This work introduces Adaptive Language-Based Assessment (ALBA), a framework for adaptively selecting language-based prompts to assess mental health with few responses. It develops ALIRT, a semi-supervised, polytomous IRT approach, and an Actor-Critic model to dynamically order questions and score a latent depression/anxiety trait from limited language data. Across PHQ-9 and GAD-7 benchmarks, ALIRT yields the highest accuracy with far fewer questions (e.g., after 3 items) and generally outperforms fixed baselines with lower computational cost. The findings show adaptive language-based assessments can maintain validity while reducing the linguistic burden, with practical implications for conversational diagnostic agents and scalable mental health screening.

Abstract

Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment ALBA, which involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method ALIRT and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ~ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.
Paper Structure (26 sections, 5 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 26 sections, 5 equations, 4 figures, 6 tables, 2 algorithms.

Figures (4)

  • Figure 1: The ALBA task: the system picks the most informative question to ask based on previous responses, much like a therapist would in real life. To do this, we introduce an IRT-based semi-supervised method, ALIRT and an Actor-Critic model, and compare their performance with a limited set of language-response questions against self-report diagnostic questionnaire scores for depression and anxiety test scores (PHQ-9 and GAD-7).
  • Figure 2: Distribution of depression and anxiety scores of participants in the dataset described in §\ref{['sec:dataset']}.
  • Figure 3: The correlation of the latent scores with the "true" (PHQ-9) scores for various polytomization levels across the number of items. 12-tomous model is likely to be overfit and does not offer significant advantage over our initial choice of 8.
  • Figure 4: Flowchart of the items picked at $n^{th}$ question using ALIRT. The selections of questions for the first few items is rather sparse. Since the latent variable estimate does achieve a high correlation with the classical psychometric measures in 3-5 questions, it hints at the irrelevance of some questions towards the psychometric measure despite high individual feature correlations.