ALBA: Adaptive Language-based Assessments for Mental Health
Vasudha Varadarajan, Sverker Sikström, Oscar N. E. Kjell, H. Andrew Schwartz
TL;DR
This work introduces Adaptive Language-Based Assessment (ALBA), a framework for adaptively selecting language-based prompts to assess mental health with few responses. It develops ALIRT, a semi-supervised, polytomous IRT approach, and an Actor-Critic model to dynamically order questions and score a latent depression/anxiety trait from limited language data. Across PHQ-9 and GAD-7 benchmarks, ALIRT yields the highest accuracy with far fewer questions (e.g., $r \approx 0.93$ after 3 items) and generally outperforms fixed baselines with lower computational cost. The findings show adaptive language-based assessments can maintain validity while reducing the linguistic burden, with practical implications for conversational diagnostic agents and scalable mental health screening.
Abstract
Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment ALBA, which involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method ALIRT and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ~ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.
