Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman
TL;DR
The paper tackles the challenge of faithfully aligning large language models to online communities and rigorously assessing fidelity across multiple linguistic dimensions. It introduces an unsupervised, scalable pipeline that constructs instruction-response demonstrations from community data, finetunes LLMs (e.g., Llama-3) to mimic the target discourse, and generates synthetic corpora for evaluation along authenticity, emotional tone, toxicity, and harm. The authors validate the approach through a case study on dieting and body-image communities, showing that finetuned models better replicate community language and harm profiles than in-context baselines, and demonstrate potential for automated moderation and public-health insights via ED screening instruments. The work highlights practical implications for social science research and platform safety while acknowledging limitations related to dataset bias, temporal shifts, artifacts from synthetic data, and ethical considerations surrounding harm assessment and diagnosis. Overall, it provides a scalable framework to construct high-fidelity digital representations of online communities and to leverage them for monitoring, research, and policy support in sensitive domains like eating disorders.
Abstract
Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation and broader applications in public health and social science research.
