Table of Contents
Fetching ...

Exposure to Content Written by Large Language Models Can Reduce Stigma Around Opioid Use Disorder in Online Communities

Shravika Mittal, Darshi Shah, Shin Won Do, Mai ElSherief, Tanushree Mitra, Munmun De Choudhury

TL;DR

This study tests whether content authored by large language models (LLMs) can reduce opioid use disorder (OUD) stigma in online communities. Using a preregistered randomized controlled design, participants read either LLM-generated, human-written, or no responses to OUD-related queries, under single- and 14-day longitudinal exposure, and attitudes toward MAT, people with OUD, and OUD itself were measured. Results show LLM content most effectively reduces MAT-related stigma (DV1) across both exposure setups, with occasional backfire effects for DV2 and DV3 and moderation by pre-existing attitudes; longitudinal exposure generally strengthens attitude change but can worsen some outcomes for DV2. The findings support carefully deployed education-based LLM interventions in online health discourse while highlighting caution about one-off exposure and the potential for unintended stigma reinforcement, depending on context and baseline beliefs.

Abstract

Widespread stigma, both in the offline and online spaces, acts as a barrier to harm reduction efforts in the context of opioid use disorder (OUD). This stigma is prominently directed towards clinically approved medications for addiction treatment (MAT), people with the condition, and the condition itself. Given the potential of artificial intelligence based technologies in promoting health equity, and facilitating empathic conversations, this work examines whether large language models (LLMs) can help abate OUD-related stigma in online communities. To answer this, we conducted a series of pre-registered randomized controlled experiments, where participants read LLM-generated, human-written, or no responses to help seeking OUD-related content in online communities. The experiment was conducted under two setups, i.e., participants read the responses either once (N = 2,141), or repeatedly for 14 days (N = 107). We found that participants reported the least stigmatized attitudes toward MAT after consuming LLM-generated responses under both the setups. This study offers insights into strategies that can foster inclusive online discourse on OUD, e.g., based on our findings LLMs can be used as an education-based intervention to promote positive attitudes and increase people's propensity toward MAT.

Exposure to Content Written by Large Language Models Can Reduce Stigma Around Opioid Use Disorder in Online Communities

TL;DR

This study tests whether content authored by large language models (LLMs) can reduce opioid use disorder (OUD) stigma in online communities. Using a preregistered randomized controlled design, participants read either LLM-generated, human-written, or no responses to OUD-related queries, under single- and 14-day longitudinal exposure, and attitudes toward MAT, people with OUD, and OUD itself were measured. Results show LLM content most effectively reduces MAT-related stigma (DV1) across both exposure setups, with occasional backfire effects for DV2 and DV3 and moderation by pre-existing attitudes; longitudinal exposure generally strengthens attitude change but can worsen some outcomes for DV2. The findings support carefully deployed education-based LLM interventions in online health discourse while highlighting caution about one-off exposure and the potential for unintended stigma reinforcement, depending on context and baseline beliefs.

Abstract

Widespread stigma, both in the offline and online spaces, acts as a barrier to harm reduction efforts in the context of opioid use disorder (OUD). This stigma is prominently directed towards clinically approved medications for addiction treatment (MAT), people with the condition, and the condition itself. Given the potential of artificial intelligence based technologies in promoting health equity, and facilitating empathic conversations, this work examines whether large language models (LLMs) can help abate OUD-related stigma in online communities. To answer this, we conducted a series of pre-registered randomized controlled experiments, where participants read LLM-generated, human-written, or no responses to help seeking OUD-related content in online communities. The experiment was conducted under two setups, i.e., participants read the responses either once (N = 2,141), or repeatedly for 14 days (N = 107). We found that participants reported the least stigmatized attitudes toward MAT after consuming LLM-generated responses under both the setups. This study offers insights into strategies that can foster inclusive online discourse on OUD, e.g., based on our findings LLMs can be used as an education-based intervention to promote positive attitudes and increase people's propensity toward MAT.

Paper Structure

This paper contains 24 sections, 2 equations, 7 figures, 21 tables.

Figures (7)

  • Figure 1: Raw percentage change in attitudes toward MAT (DV1), people with OUD (DV2), and OUD (DV3), averaged across participants, for the Control, Human, and LLM interventions after (a) single and (b) longitudinal exposure setups. Percentage change in attitudes was computed as $\frac{(Y_{post} - Y_{pre})}{Y_{pre}} \times 100$; where $Y_{post}$ and $Y_{pre}$ represent the aggregated DV score post and pre intervention.
  • Figure 2: Analysis of the intervention content. Responses read by participants within the LLM and Human intervention groups were evaluated for (a), (b) emotional appeal, (c) readability, and (d) shared sense of belonging. Emotional appeal is reported using five relevant categories available in Empath fast2016empath, a lexicon-based tool; a higher score is indicative of a higher alignment to the category. Readability is reported using the Flesch-Kincaid Grade Level index kincaid1975derivation; a lower score is indicative of simpler, more readable text. Shared sense of belonging is reported using the identity social dimension classifier choi2020ten, which quantifies in-group or community forward linguistic cues; the higher the score the better. Scores are averaged across all the responses read by participants during the single and longitudinal exposure setups. Mann-Whitney U-tests were performed to explore differences in score distributions for responses provided in the LLM and Human interventions. Statistically significant differences are noted with the test statistic and p-values ($p$): * ($p$ < $0.05$), ** ($p$ < $0.01$), or *** ($p$ < $0.001$). (e) At the end of our experiments, participants in the LLM and Human interventions rated the responses consumed during the respective interventions, using a 5-point Likert scale, as: (1) influential, responses offered a different approach to look at OUD; (2) credible, responses were reasonable and trustworthy; (3) informative, responses were knowledgeable; (4) resourceful, likely to refer to such responses to gain information about OUD; and (5) supportive, prefer to receive such responses if one had OUD. On finding no significant differences in ratings across single and longitudinal exposure setups, we combined participant ratings for the two setups and report a weighted average (weighted by the sample size). Mann-Whitney U-tests were performed to examine differences in score distributions for ratings provided by participants in the LLM and Human intervention groups. Statistically significant differences are noted with the p-values ($p$): * ($p$ < $0.05$), ** ($p$ < $0.01$), or *** ($p$ < $0.001$).
  • Figure 3: Pre- and post-intervention attitudes toward the three DVs for participants within the LLM intervention. (a), (b), and (c): single exposure setup; (d), (e), and (f): longitudinal exposure setup. Participants were divided into three groups -- low, medium, and high -- based on their pre-intervention attitudes. The Likert scale range (1 to 5) was equally divided into three parts to achieve this. The red lines indicate the regression line or the best linear fit for the pre-/post-intervention score distributions (m: slope, b: intercept). The gray dotted line represents the no-change post-intervention fit, i.e., $Y_{pre} = Y_{post}$. Participants (represented via dots) above (below) the gray line reported a higher (lower) attitude post intervention. Participants in the low (high) pre-intervention score category were more concentrated in the region above (below) the no-change post-intervention line fit.
  • Figure 4: Overview of our study workflow. Represents the four phases involved in Study (a): single exposure setup and Study (b): longitudinal exposure setup.
  • Figure S1: Overview of our method to gather content for the Human and LLM interventions. We used the Reddit-QA dataset and filtered out $112$ posts containing an OUD-related query. For the $112$ posts, we then considered the top-most voted comment as a proxy for the human-generated response (Human Intervention). We created a prompt (Table \ref{['tab:gpt4-prompt']}) to get LLM-generated responses, via GPT-4, for the $112$ posts (LLM Intervention).
  • ...and 2 more figures