Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

Mohit Chandra; Siddharth Sriraman; Gaurav Verma; Harneet Singh Khanuja; Jose Suarez Campayo; Zihang Li; Michael L. Birnbaum; Munmun De Choudhury

Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

Mohit Chandra, Siddharth Sriraman, Gaurav Verma, Harneet Singh Khanuja, Jose Suarez Campayo, Zihang Li, Michael L. Birnbaum, Munmun De Choudhury

TL;DR

This work introduces the Psych-ADR benchmark and the ADRA framework to systematically evaluate how large language models detect ADRs related to psychiatric medications and respond with expert-aligned harm-reduction guidance. Across 239 Reddit posts, LLMs show meaningful but incomplete capability: ADR detection approaches ~77% accuracy in the best cases, yet many models misclassify ADR types and exhibit risk-averse biases. In alignment tasks, LLMs struggle with readability and actionable, physician-aligned harm-reduction strategies, achieving only up to 70.86% agreement with experts on HRS and generally lower actionability compared to clinician responses. The study highlights the need to incorporate lived experience and domain-specific alignment into high-risk AI systems, and provides a benchmark and framework to drive future improvements in medical dialogue AI.

Abstract

Adverse Drug Reactions (ADRs) from psychiatric medications are the leading cause of hospitalizations among mental health patients. With healthcare systems and online communities facing limitations in resolving ADR-related issues, Large Language Models (LLMs) have the potential to fill this gap. Despite the increasing capabilities of LLMs, past research has not explored their capabilities in detecting ADRs related to psychiatric medications or in providing effective harm reduction strategies. To address this, we introduce the Psych-ADR benchmark and the Adverse Drug Reaction Response Assessment (ADRA) framework to systematically evaluate LLM performance in detecting ADR expressions and delivering expert-aligned mitigation strategies. Our analyses show that LLMs struggle with understanding the nuances of ADRs and differentiating between types of ADRs. While LLMs align with experts in terms of expressed emotions and tone of the text, their responses are more complex, harder to read, and only 70.86% aligned with expert strategies. Furthermore, they provide less actionable advice by a margin of 12.32% on average. Our work provides a comprehensive benchmark and evaluation framework for assessing LLMs in strategy-driven tasks within high-risk domains.

Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

TL;DR

Abstract

Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)