Harmful Suicide Content Detection
Kyumin Park, Myung Jae Baik, YeongJun Hwang, Yen Shin, HoJae Lee, Ruda Lee, Sang Min Lee, Je Young Hannah Sun, Ah Rah Lee, Si Yeun Yoon, Dong-ho Lee, Jihyung Moon, JinYeong Bak, Kyunghyun Cho, Jong-Woo Paik, Sungjoon Park
TL;DR
This work defines a novel harmful suicide content detection task that classifies online posts into five levels of illegality, harmfulness, and suicide relation, with a multimodal Korean benchmark annotated by medical professionals. It builds a 452-item dataset across text, images, context, and metadata, supplemented by a task description document to guide annotators and inform LLM-based moderation strategies; an English benchmark via GPT-4 translation enables evaluation of open-source models. The study demonstrates that GPT-4 achieves notable F1 scores in detecting illegal and harmful content (around 66.46 and 77.09, respectively) and provides insights into how task descriptions, input modalities, and few-shot cues influence performance. It highlights practical moderation implications, showing how instruction-based approaches can adapt to evolving criteria, while also emphasizing ethical safeguards and the need for moderator review to manage high-stress content. Together, these contributions offer a concrete blueprint for deploying LLM-assisted moderation of harmful suicide content with a clinically informed, ethics-conscious dataset and workflow.
Abstract
Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automatically detecting the harmfulness of content. To fill this gap, we introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels. We develop a multi-modal benchmark and a task description document in collaboration with medical professionals, and leverage large language models (LLMs) to explore efficient methods for moderating such content. Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content. Owing to the potential harm involved, we publicize our implementations and benchmark, incorporating an ethical verification process.
