Table of Contents
Fetching ...

CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection

Grace Byun, Rebecca Lipschutz, Sean T. Minton, Abigail Lott, Jinho D. Choi

TL;DR

CRADLE BENCH introduces a clinician-annotated, temporally aware benchmark for seven high-risk mental health crises, addressing a gap in reliably detecting emergencies during user-model interactions. It combines expertly labeled evaluation data with a 4k ensemble-labeled training set and analyzes 15 LLMs, revealing performance gaps and the benefits of majority-voting ensembles. The work demonstrates how consensus- and unanimous-agreement supervision can improve crisis detection models, and it provides fine-tuned open- and closed-source models for researchers. The dataset, models, and code are released to advance safe, clinically informed AI systems in crisis contexts, while acknowledging ethical considerations and limitations around safety filtering and recall. The study highlights practical implications for timely intervention and risk mitigation in real-world applications.

Abstract

Detecting mental health crisis situations such as suicide ideation, rape, domestic violence, child abuse, and sexual harassment is a critical yet underexplored challenge for language models. When such situations arise during user--model interactions, models must reliably flag them, as failure to do so can have serious consequences. In this work, we introduce CRADLE BENCH, a benchmark for multi-faceted crisis detection. Unlike previous efforts that focus on a limited set of crisis types, our benchmark covers seven types defined in line with clinical standards and is the first to incorporate temporal labels. Our benchmark provides 600 clinician-annotated evaluation examples and 420 development examples, together with a training corpus of around 4K examples automatically labeled using a majority-vote ensemble of multiple language models, which significantly outperforms single-model annotation. We further fine-tune six crisis detection models on subsets defined by consensus and unanimous ensemble agreement, providing complementary models trained under different agreement criteria.

CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection

TL;DR

CRADLE BENCH introduces a clinician-annotated, temporally aware benchmark for seven high-risk mental health crises, addressing a gap in reliably detecting emergencies during user-model interactions. It combines expertly labeled evaluation data with a 4k ensemble-labeled training set and analyzes 15 LLMs, revealing performance gaps and the benefits of majority-voting ensembles. The work demonstrates how consensus- and unanimous-agreement supervision can improve crisis detection models, and it provides fine-tuned open- and closed-source models for researchers. The dataset, models, and code are released to advance safe, clinically informed AI systems in crisis contexts, while acknowledging ethical considerations and limitations around safety filtering and recall. The study highlights practical implications for timely intervention and risk mitigation in real-world applications.

Abstract

Detecting mental health crisis situations such as suicide ideation, rape, domestic violence, child abuse, and sexual harassment is a critical yet underexplored challenge for language models. When such situations arise during user--model interactions, models must reliably flag them, as failure to do so can have serious consequences. In this work, we introduce CRADLE BENCH, a benchmark for multi-faceted crisis detection. Unlike previous efforts that focus on a limited set of crisis types, our benchmark covers seven types defined in line with clinical standards and is the first to incorporate temporal labels. Our benchmark provides 600 clinician-annotated evaluation examples and 420 development examples, together with a training corpus of around 4K examples automatically labeled using a majority-vote ensemble of multiple language models, which significantly outperforms single-model annotation. We further fine-tune six crisis detection models on subsets defined by consensus and unanimous ensemble agreement, providing complementary models trained under different agreement criteria.

Paper Structure

This paper contains 45 sections, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Crisis type distribution visualization (ignoring past/ongoing) across all splits. Percentages are computed relative to the total number of labels.
  • Figure 2: Illustration of the ensemble method. Three LLMs predict labels for each instance, and the final labels are determined by majority voting.
  • Figure 3: Complete prompt used for LLM-based crisis annotation (Part 1 of 3).
  • Figure 4: Complete prompt used for LLM-based crisis annotation - Continued (Part 2 of 3).
  • Figure 5: Complete prompt used for LLM-based crisis annotation - Continued (Part 3 of 3).
  • ...and 1 more figures