From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
Ugur Kursuncu, Trilok Padhi, Gaurav Sinha, Abdulkadir Erol, Jaya Krishna Mandivarapu, Christopher R. Larrison
TL;DR
This work systematically evaluates GPT-3.5 Turbo and Llama 2 (7B) for anxiety support using authentic Reddit data to prompt and fine-tune, applying a domain-specific, psychotherapy-informed evaluation across linguistic quality, safety/trustworthiness, and supportiveness. The study combines automated metrics with clinician ratings, revealing that fine-tuning improves readability and coherence but increases toxicity and bias, with GPT-3.5 generally outperforming Llama 2 in linguistic quality while Llama 2 shows more conservative behavior. The findings highlight significant risks in deploying unfiltered social-media data for domain adaptation and emphasize the need for mitigation strategies, careful data curation, and hybrid approaches to preserve empathy and safety in mental health contexts. The work provides actionable guidance for researchers and practitioners aiming to integrate generative AI into anxiety support systems, including the adoption of safety protocols, RLHF, and domain-grounded benchmarks. Future directions include incorporating knowledge-infused methods and partnering with clinical experts to enhance safety, empathy, and long-term trust in multi-turn mental health interventions.
Abstract
The growing demand for accessible mental health support, compounded by workforce shortages and logistical barriers, has led to increased interest in utilizing Large Language Models (LLMs) for scalable and real-time assistance. However, their use in sensitive domains such as anxiety support remains underexamined. This study presents a systematic evaluation of LLMs (GPT and Llama) for their potential utility in anxiety support by using real user-generated posts from the r/Anxiety subreddit for both prompting and fine-tuning. Our approach utilizes a mixed-method evaluation framework incorporating three main categories of criteria: (i) linguistic quality, (ii) safety and trustworthiness, and (iii) supportiveness. Results show that fine-tuning LLMs with naturalistic anxiety-related data enhanced linguistic quality but increased toxicity and bias, and diminished emotional responsiveness. While LLMs exhibited limited empathy, GPT was evaluated as more supportive overall. Our findings highlight the risks of fine-tuning LLMs on unprocessed social media content without mitigation strategies.
