Table of Contents
Fetching ...

Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts

Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan

TL;DR

This work tackles the challenge of detecting self-disclosures of opioid use disorder phases in Reddit posts by introducing a principled annotation framework and a 2500-post corpus annotated for six OUD phases plus span-level explanations. It systematically evaluates zero-shot, few-shot, and fully supervised modeling with and without explanations, showing that including explanations consistently boosts performance and that smaller, novice-annotated models can outperform large, instruction-tuned models in supervised settings. The study also analyzes annotator disagreements and dataset uncertainty, highlighting their impact on model behavior and the importance of human-in-the-loop decision making in high-stakes health contexts. Overall, the approach enables more reliable identification of OUD-related disclosures in social media, with implications for targeted interventions and future longitudinal, user-level analyses across time.

Abstract

In the last decade, the United States has lost more than 500,000 people from an overdose involving prescription and illicit opioids making it a national public health emergency (USDHHS, 2017). Medical practitioners require robust and timely tools that can effectively identify at-risk patients. Community-based social media platforms such as Reddit allow self-disclosure for users to discuss otherwise sensitive drug-related behaviors. We present a moderate size corpus of 2500 opioid-related posts from various subreddits labeled with six different phases of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, Not Using. For every post, we annotate span-level extractive explanations and crucially study their role both in annotation quality and model development. We evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting. Experimental results and error analysis show that identifying the phases of opioid use disorder is highly contextual and challenging. However, we find that using explanations during modeling leads to a significant boost in classification accuracy demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum.

Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts

TL;DR

This work tackles the challenge of detecting self-disclosures of opioid use disorder phases in Reddit posts by introducing a principled annotation framework and a 2500-post corpus annotated for six OUD phases plus span-level explanations. It systematically evaluates zero-shot, few-shot, and fully supervised modeling with and without explanations, showing that including explanations consistently boosts performance and that smaller, novice-annotated models can outperform large, instruction-tuned models in supervised settings. The study also analyzes annotator disagreements and dataset uncertainty, highlighting their impact on model behavior and the importance of human-in-the-loop decision making in high-stakes health contexts. Overall, the approach enables more reliable identification of OUD-related disclosures in social media, with implications for targeted interventions and future longitudinal, user-level analyses across time.

Abstract

In the last decade, the United States has lost more than 500,000 people from an overdose involving prescription and illicit opioids making it a national public health emergency (USDHHS, 2017). Medical practitioners require robust and timely tools that can effectively identify at-risk patients. Community-based social media platforms such as Reddit allow self-disclosure for users to discuss otherwise sensitive drug-related behaviors. We present a moderate size corpus of 2500 opioid-related posts from various subreddits labeled with six different phases of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, Not Using. For every post, we annotate span-level extractive explanations and crucially study their role both in annotation quality and model development. We evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting. Experimental results and error analysis show that identifying the phases of opioid use disorder is highly contextual and challenging. However, we find that using explanations during modeling leads to a significant boost in classification accuracy demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum.
Paper Structure (37 sections, 1 figure, 14 tables)