Table of Contents
Fetching ...

In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts

Filomene Roquefort, Alexandre Ducorroy, Rachid Riad

TL;DR

The paper tackles scalable detection of suicide risk in adolescents under privacy constraints by leveraging transcript-based analysis with in-context learning from Large Language Models, guided by the DSPy prompting framework. By evaluating multiple LLMs with zero-shot, few-shot (notably 4-shot with Gemma2-9b), and Chain-of-Thought prompts on the SW1 Chinese adolescent dataset, the authors achieve up to 0.68 accuracy on the test set and demonstrate robust effects of example count on performance. Statistical analysis reveals model-type and model-size interactions, with larger models offering higher baseline accuracy but diminishing gains from additional in-context examples ($R^2=0.134$). The work demonstrates a privacy-preserving, scalable pathway for automated suicide risk assessment from speech transcripts, while outlining future efforts to improve interpretability and clinical integration.

Abstract

Early suicide risk detection in adolescents is critical yet hindered by scalability challenges of current assessments. This paper presents our approach to the first SpeechWellness Challenge (SW1), which aims to assess suicide risk in Chinese adolescents through speech analysis. Due to speech anonymization constraints, we focused on linguistic features, leveraging Large Language Models (LLMs) for transcript-based classification. Using DSPy for systematic prompt engineering, we developed a robust in-context learning approach that outperformed traditional fine-tuning on both linguistic and acoustic markers. Our systems achieved third and fourth places among 180+ submissions, with 0.68 accuracy (F1=0.7) using only transcripts. Ablation analyses showed that increasing prompt example improved performance (p=0.003), with varying effects across model types and sizes. These findings advance automated suicide risk assessment and demonstrate LLMs' value in mental health applications.

In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts

TL;DR

The paper tackles scalable detection of suicide risk in adolescents under privacy constraints by leveraging transcript-based analysis with in-context learning from Large Language Models, guided by the DSPy prompting framework. By evaluating multiple LLMs with zero-shot, few-shot (notably 4-shot with Gemma2-9b), and Chain-of-Thought prompts on the SW1 Chinese adolescent dataset, the authors achieve up to 0.68 accuracy on the test set and demonstrate robust effects of example count on performance. Statistical analysis reveals model-type and model-size interactions, with larger models offering higher baseline accuracy but diminishing gains from additional in-context examples (). The work demonstrates a privacy-preserving, scalable pathway for automated suicide risk assessment from speech transcripts, while outlining future efforts to improve interpretability and clinical integration.

Abstract

Early suicide risk detection in adolescents is critical yet hindered by scalability challenges of current assessments. This paper presents our approach to the first SpeechWellness Challenge (SW1), which aims to assess suicide risk in Chinese adolescents through speech analysis. Due to speech anonymization constraints, we focused on linguistic features, leveraging Large Language Models (LLMs) for transcript-based classification. Using DSPy for systematic prompt engineering, we developed a robust in-context learning approach that outperformed traditional fine-tuning on both linguistic and acoustic markers. Our systems achieved third and fourth places among 180+ submissions, with 0.68 accuracy (F1=0.7) using only transcripts. Ablation analyses showed that increasing prompt example improved performance (p=0.003), with varying effects across model types and sizes. These findings advance automated suicide risk assessment and demonstrate LLMs' value in mental health applications.

Paper Structure

This paper contains 8 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (Left) Overview of the three methods used for our submissions to the Speech Wellness Challenge. The first method employs direct audio processing using a pretrained audio encoder paired with a classifier. The second method uses a Large Language Model with a 4-shot classification approach. The third method extends this approach by adding chain-of-thought instruction. (Right) Illustration of the few-shot prompting technique and the chain-of-thought reasoning. The transcriptions are fake for privacy reasons, but the reasoning path is extracted from a successful prediction of the LLM on the dev set.
  • Figure 2: Prompt template for our submission to define the DSPY program to tackle suicide risk detection based on speech transcripts.
  • Figure 3: Confusion matrix for our best system on the test set.