Harmful Suicide Content Detection

Kyumin Park; Myung Jae Baik; YeongJun Hwang; Yen Shin; HoJae Lee; Ruda Lee; Sang Min Lee; Je Young Hannah Sun; Ah Rah Lee; Si Yeun Yoon; Dong-ho Lee; Jihyung Moon; JinYeong Bak; Kyunghyun Cho; Jong-Woo Paik; Sungjoon Park

Harmful Suicide Content Detection

Kyumin Park, Myung Jae Baik, YeongJun Hwang, Yen Shin, HoJae Lee, Ruda Lee, Sang Min Lee, Je Young Hannah Sun, Ah Rah Lee, Si Yeun Yoon, Dong-ho Lee, Jihyung Moon, JinYeong Bak, Kyunghyun Cho, Jong-Woo Paik, Sungjoon Park

TL;DR

This work defines a novel harmful suicide content detection task that classifies online posts into five levels of illegality, harmfulness, and suicide relation, with a multimodal Korean benchmark annotated by medical professionals. It builds a 452-item dataset across text, images, context, and metadata, supplemented by a task description document to guide annotators and inform LLM-based moderation strategies; an English benchmark via GPT-4 translation enables evaluation of open-source models. The study demonstrates that GPT-4 achieves notable F1 scores in detecting illegal and harmful content (around 66.46 and 77.09, respectively) and provides insights into how task descriptions, input modalities, and few-shot cues influence performance. It highlights practical moderation implications, showing how instruction-based approaches can adapt to evolving criteria, while also emphasizing ethical safeguards and the need for moderator review to manage high-stress content. Together, these contributions offer a concrete blueprint for deploying LLM-assisted moderation of harmful suicide content with a clinically informed, ethics-conscious dataset and workflow.

Abstract

Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automatically detecting the harmfulness of content. To fill this gap, we introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels. We develop a multi-modal benchmark and a task description document in collaboration with medical professionals, and leverage large language models (LLMs) to explore efficient methods for moderating such content. Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content. Owing to the potential harm involved, we publicize our implementations and benchmark, incorporating an ethical verification process.

Harmful Suicide Content Detection

TL;DR

Abstract

Paper Structure (26 sections, 10 figures, 21 tables)

This paper contains 26 sections, 10 figures, 21 tables.

Introduction
Related Work
Suicide Content
Suicide Risk Detection
Harmful Suicide Content Detection
Harmful Suicide Content Detection - Input
Harmful Suicide Content Detection - Output
Moderator Review
Harmful Suicide Content Benchmark
Suicide Content Collection
Preprocessing
Annotation
Harmful Suicide Content Benchmark
Experiment
Leveraging Task Description
...and 11 more sections

Figures (10)

Figure 1: Moderation system for harmful suicide content detection, categorizing online user-generated content into five classes by legality, harmfulness and suicide relation. A moderator reviews content with potential illegality or harm, leading to legal reporting or content removal requests. No action is taken if no risks are found.
Figure 2: Benchmark examples for each category (illegal, harmful potentially harmful, harmless, and non-suicide) in harmful suicide content detection.
Figure 3: Qualitative analysis of benchmark translation results. The source text is the content text from the Korean benchmark data, and the proper translation is the result translated by a human while preserving the meaning. The translation result is obtained using a model and has been applied to the English benchmark. The red word indicates parts where translation errors occurred in the model's output.
Figure 4: Results from the few-shot example experiment. Increasing examples increases illegal F1 and recall, with 5-shot setting achieving peak performance in the illegal metric.
Figure 5: Results from the Korean benchmark experiment. Hatched bars indicate the Korean LLM (Clova X). Although Clova X has lower overall performance compared to GPTs, it excels in harmful recall.
...and 5 more figures

Harmful Suicide Content Detection

TL;DR

Abstract

Harmful Suicide Content Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)