Table of Contents
Fetching ...

Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation

Tong Li, Shu Yang, Junchao Wu, Jiyao Wei, Lijie Hu, Mengdi Li, Derek F. Wong, Joshua R. Oltmanns, Di Wang

TL;DR

The paper tackles the problem of detecting implicit suicidal ideation and providing safe, effective support in large language models. It introduces DeepSuiMind, a psychologically grounded dataset built on the Death/Suicide Implicit Association Test, Automatic Negative Thinking, and real-world stressors, coupled with a psychology-informed evaluation framework using distress-aware prompts. Through experiments with eight LLMs, the authors find substantial gaps in IIS detection and PAS quality for implicit cues, though distress-aware prompting can improve recognition. The results highlight significant safety and evaluation gaps in current models and emphasize the need for stronger, theory-driven safety benchmarks and model design for sensitive mental-health applications.

Abstract

We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.

Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation

TL;DR

The paper tackles the problem of detecting implicit suicidal ideation and providing safe, effective support in large language models. It introduces DeepSuiMind, a psychologically grounded dataset built on the Death/Suicide Implicit Association Test, Automatic Negative Thinking, and real-world stressors, coupled with a psychology-informed evaluation framework using distress-aware prompts. Through experiments with eight LLMs, the authors find substantial gaps in IIS detection and PAS quality for implicit cues, though distress-aware prompting can improve recognition. The results highlight significant safety and evaluation gaps in current models and emphasize the need for stronger, theory-driven safety benchmarks and model design for sensitive mental-health applications.

Abstract

We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.

Paper Structure

This paper contains 29 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Real-world examples of LLM handling of suicide-related dialogues: (a) LLMs provide appropriate support for explicit mentions; (b) implicit ideation—hopelessness, numbness, despair—the model shows low sensitivity to severe emotional cues, offers vague validation and reinforces hopelessness; (c) in a chat with a suicidal teen bonded to a role-play AI, the model misses a farewell signal, deepens dependency, and omits real-world support.
  • Figure 2: Process and illustration of constructing implicit suicidal ideation data and our evaluation strategies.
  • Figure 3: Comparison of Model Performance Distributions and Five-Dimensional Evaluation Scores. Left: Box plots show total response scores across models under SS and DS prompting for both implicit and explicit cases. Right: Average scores on five evaluation dimensions. Solid bars indicate implicit cases; striped bars represent explicit cases.
  • Figure 4: Model radar chart comparisons across multiple dimensions
  • Figure 5: Classification prompts used to categorize different types of suicidal ideation based on the Death/Suicide Implicit Association Test framework. These prompts define three distinct patterns of suicidal thinking: self-associated death ideation (Death-Me), disassociation from life (Life-Not Me), and projection of death ideation through others (Death-Not Me).
  • ...and 3 more figures