Table of Contents
Fetching ...

Independent Clinical Evaluation of General-Purpose LLM Responses to Signals of Suicide Risk

Nick Judd, Alexandre Vaz, Kevin Paeth, Layla Inés Davis, Milena Esherick, Jason Brand, Inês Amaro, Tony Rousmaniere

TL;DR

This work addresses how general-purpose LLMs should respond to signals of suicide risk by deploying a clinician-informed codebook in a multi-turn prompt–response setup. Using 43 therapists/trainees and 829 annotated interactions with the open-source OLMo-2-32b model, the study employs mixed-effects logistic regression to quantify how risk factors influence model behavior. Key findings show a propensity for the model to withdraw as risk indicators accumulate and to inconsistently acknowledge risk across factors, though empathy and non-specific encouragement are common while concrete resource referrals are variable. The study provides a replicable methodology and argues for ongoing, open evaluation to guide policy and design of safer, more effective mental health support tools.

Abstract

We introduce findings and methods to facilitate evidence-based discussion about how large language models (LLMs) should behave in response to user signals of risk of suicidal thoughts and behaviors (STB). People are already using LLMs as mental health resources, and several recent incidents implicate LLMs in mental health crises. Despite growing attention, few studies have been able to effectively generalize clinical guidelines to LLM use cases, and fewer still have proposed methodologies that can be iteratively applied as knowledge improves about the elements of human-AI interaction most in need of study. We introduce an assessment of LLM alignment with guidelines for ethical communication, adapted from clinical principles and applied to expressions of risk factors for STB in multi-turn conversations. Using a codebook created and validated by clinicians, mobilizing the volunteer participation of practicing therapists and trainees (N=43) based in the U.S., and using generalized linear mixed-effects models for statistical analysis, we assess a single fully open-source LLM, OLMo-2-32b. We show how to assess when a model deviates from clinically informed guidelines in a way that may pose a hazard and (thanks to its open nature) facilitates future investigation as to why. We find that contrary to clinical best practice, OLMo-2-32b, and, possibly by extension, other LLMs, will become less likely to invite continued dialog as users send more signals of STB risk in multi-turn settings. We also show that OLMo-2-32b responds differently depending on the risk factor expressed. This empirical evidence highlights that just as chatbots pose hazards if their responses reinforce delusions or assist in suicidal acts, they may also discourage further help-seeking or cause feelings of dismissal or abandonment by withdrawing from conversations when STB risk is expressed.

Independent Clinical Evaluation of General-Purpose LLM Responses to Signals of Suicide Risk

TL;DR

This work addresses how general-purpose LLMs should respond to signals of suicide risk by deploying a clinician-informed codebook in a multi-turn prompt–response setup. Using 43 therapists/trainees and 829 annotated interactions with the open-source OLMo-2-32b model, the study employs mixed-effects logistic regression to quantify how risk factors influence model behavior. Key findings show a propensity for the model to withdraw as risk indicators accumulate and to inconsistently acknowledge risk across factors, though empathy and non-specific encouragement are common while concrete resource referrals are variable. The study provides a replicable methodology and argues for ongoing, open evaluation to guide policy and design of safer, more effective mental health support tools.

Abstract

We introduce findings and methods to facilitate evidence-based discussion about how large language models (LLMs) should behave in response to user signals of risk of suicidal thoughts and behaviors (STB). People are already using LLMs as mental health resources, and several recent incidents implicate LLMs in mental health crises. Despite growing attention, few studies have been able to effectively generalize clinical guidelines to LLM use cases, and fewer still have proposed methodologies that can be iteratively applied as knowledge improves about the elements of human-AI interaction most in need of study. We introduce an assessment of LLM alignment with guidelines for ethical communication, adapted from clinical principles and applied to expressions of risk factors for STB in multi-turn conversations. Using a codebook created and validated by clinicians, mobilizing the volunteer participation of practicing therapists and trainees (N=43) based in the U.S., and using generalized linear mixed-effects models for statistical analysis, we assess a single fully open-source LLM, OLMo-2-32b. We show how to assess when a model deviates from clinically informed guidelines in a way that may pose a hazard and (thanks to its open nature) facilitates future investigation as to why. We find that contrary to clinical best practice, OLMo-2-32b, and, possibly by extension, other LLMs, will become less likely to invite continued dialog as users send more signals of STB risk in multi-turn settings. We also show that OLMo-2-32b responds differently depending on the risk factor expressed. This empirical evidence highlights that just as chatbots pose hazards if their responses reinforce delusions or assist in suicidal acts, they may also discourage further help-seeking or cause feelings of dismissal or abandonment by withdrawing from conversations when STB risk is expressed.

Paper Structure

This paper contains 25 sections, 4 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Overview of the experiment design. (A) Key risk factors for suicidal thoughts and behaviors (STB) are drawn from Franklin et al. franklin_risk_2017, a recent authoritative meta-analytic review (described in Table \ref{['tab:risk_factors']}). (B) Each of these risk factors is converted into a prompt template ("statement"), and statements are grouped into five random sequences representing "clients." (C) Participants role play as each client and improvise on each statement in order, so as to prompt the model with an interaction that represents a key risk factor (the general interface is presented in Figure \ref{['fig:interface_chat']}). (D) Participants annotate each response according to the codebook (described in Section \ref{['section:codebook']}) in an annotation modal (additional detail in Appendix Figure \ref{['fig:interface_annotation']}).
  • Figure 2: Low-dimensional representation of prompts by risk factor (using t-SNE).
  • Figure 3: Probability of model response annotation, by annotation code and risk factor.
  • Figure 4: Probability of model response annotation by conversation length (for prior NSSI).
  • Figure 5: The overview of participant recruitment and participation stages. Participants were recruited from mailing lists for practicing mental health professionals and screened prior to participation. Participation starts with a guided session that also introduces the web application through which the user performs their interactive assessment and annotations, and concludes with an anonymized demographic survey.
  • ...and 6 more figures