Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Ziyun Cui; Chang Lei; Wen Wu; Yinan Duan; Diyang Qu; Ji Wu; Runsen Chen; Chao Zhang

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

TL;DR

This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for experiments.

Abstract

The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications.

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

TL;DR

Abstract

Paper Structure (15 sections, 2 figures, 6 tables)

This paper contains 15 sections, 2 figures, 6 tables.

Introduction
Suicide Data
Automatic Suicide Risk Detection
Speech foundation model: Whisper
Large Language Models
Finetuning strategy
Fusion methods
Experiment Setup
Baselines
Implementation Details
Results and Discussion
Comparison of Different Tuning and Fusion Strategies
Analysis of Prompt Influence when Finetuning
Conclusions
Acknowledgement

Figures (2)

Figure 1: Overall pipeline. Modal fusion on acoustic feature and text feature is performed, followed by a classifier for detection of suicide risk.
Figure 2: Model structure of two different fusion methods. For concatenation, the speech and text feature are extracted separately and pooled before concatenated. For in-context, speech embedding is mapped to LLM's hidden space and concatenated with embedded tokens, before fed into decoders of LLMs.

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

TL;DR

Abstract

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)