Community-Aligned Behavior Under Uncertainty: Evidence of Epistemic Stance Transfer in LLMs
Patrick Gerard, Aiden Chang, Svitlana Volkova
TL;DR
This work addresses whether community-aligned large language models (LLMs) exhibit generalizable epistemic behaviors under uncertainty rather than simply recalling facts. It introduces a formal framework for epistemic stance transfer, coupling targeted event unlearning with assessments of model responses to novel scenarios using two discourse domains (Russian/Ukrainian Telegram war blogs and US partisan Twitter). Empirically, aligned models remain close to organic baselines in Jensen–Shannon divergence $D_{JS}$ and maintain low epistemic entropy $H$ even after removing event knowledge, indicating durable, transferable stance patterns. The findings imply that alignment embeds stable biases that persist beyond surface recall, motivating framework-based safety audits and suggesting pathways to generalize evaluation to other domains and to investigate the underlying mechanisms of stance transfer.
Abstract
When large language models (LLMs) are aligned to a specific online community, do they exhibit generalizable behavioral patterns that mirror that community's attitudes and responses to new uncertainty, or are they simply recalling patterns from training data? We introduce a framework to test epistemic stance transfer: targeted deletion of event knowledge, validated with multiple probes, followed by evaluation of whether models still reproduce the community's organic response patterns under ignorance. Using Russian--Ukrainian military discourse and U.S. partisan Twitter data, we find that even after aggressive fact removal, aligned LLMs maintain stable, community-specific behavioral patterns for handling uncertainty. These results provide evidence that alignment encodes structured, generalizable behaviors beyond surface mimicry. Our framework offers a systematic way to detect behavioral biases that persist under ignorance, advancing efforts toward safer and more transparent LLM deployments.
