Table of Contents
Fetching ...

Community-Aligned Behavior Under Uncertainty: Evidence of Epistemic Stance Transfer in LLMs

Patrick Gerard, Aiden Chang, Svitlana Volkova

TL;DR

This work addresses whether community-aligned large language models (LLMs) exhibit generalizable epistemic behaviors under uncertainty rather than simply recalling facts. It introduces a formal framework for epistemic stance transfer, coupling targeted event unlearning with assessments of model responses to novel scenarios using two discourse domains (Russian/Ukrainian Telegram war blogs and US partisan Twitter). Empirically, aligned models remain close to organic baselines in Jensen–Shannon divergence $D_{JS}$ and maintain low epistemic entropy $H$ even after removing event knowledge, indicating durable, transferable stance patterns. The findings imply that alignment embeds stable biases that persist beyond surface recall, motivating framework-based safety audits and suggesting pathways to generalize evaluation to other domains and to investigate the underlying mechanisms of stance transfer.

Abstract

When large language models (LLMs) are aligned to a specific online community, do they exhibit generalizable behavioral patterns that mirror that community's attitudes and responses to new uncertainty, or are they simply recalling patterns from training data? We introduce a framework to test epistemic stance transfer: targeted deletion of event knowledge, validated with multiple probes, followed by evaluation of whether models still reproduce the community's organic response patterns under ignorance. Using Russian--Ukrainian military discourse and U.S. partisan Twitter data, we find that even after aggressive fact removal, aligned LLMs maintain stable, community-specific behavioral patterns for handling uncertainty. These results provide evidence that alignment encodes structured, generalizable behaviors beyond surface mimicry. Our framework offers a systematic way to detect behavioral biases that persist under ignorance, advancing efforts toward safer and more transparent LLM deployments.

Community-Aligned Behavior Under Uncertainty: Evidence of Epistemic Stance Transfer in LLMs

TL;DR

This work addresses whether community-aligned large language models (LLMs) exhibit generalizable epistemic behaviors under uncertainty rather than simply recalling facts. It introduces a formal framework for epistemic stance transfer, coupling targeted event unlearning with assessments of model responses to novel scenarios using two discourse domains (Russian/Ukrainian Telegram war blogs and US partisan Twitter). Empirically, aligned models remain close to organic baselines in Jensen–Shannon divergence and maintain low epistemic entropy even after removing event knowledge, indicating durable, transferable stance patterns. The findings imply that alignment embeds stable biases that persist beyond surface recall, motivating framework-based safety audits and suggesting pathways to generalize evaluation to other domains and to investigate the underlying mechanisms of stance transfer.

Abstract

When large language models (LLMs) are aligned to a specific online community, do they exhibit generalizable behavioral patterns that mirror that community's attitudes and responses to new uncertainty, or are they simply recalling patterns from training data? We introduce a framework to test epistemic stance transfer: targeted deletion of event knowledge, validated with multiple probes, followed by evaluation of whether models still reproduce the community's organic response patterns under ignorance. Using Russian--Ukrainian military discourse and U.S. partisan Twitter data, we find that even after aggressive fact removal, aligned LLMs maintain stable, community-specific behavioral patterns for handling uncertainty. These results provide evidence that alignment encodes structured, generalizable behaviors beyond surface mimicry. Our framework offers a systematic way to detect behavioral biases that persist under ignorance, advancing efforts toward safer and more transparent LLM deployments.

Paper Structure

This paper contains 45 sections, 15 equations, 4 figures, 36 tables.

Figures (4)

  • Figure 1: Epistemic Stance Transfer Testing. This figure illustrates how we evaluate whether LLMs exhibit stable, community-aligned behavioral patterns under uncertainty or simply rely on factual recall. Top row: Models are aligned to a community using lightweight methods (e.g., system prompts, prepended examples) or heavyweight methods (fine-tuning on pre-event discourse), then have knowledge of key events deleted via SAMI to simulate pre-event ignorance. These knowledge-deleted models are then evaluated on novel scenarios. Bottom row: The knowledge deletion method is adapted from Su et al. su2025concepts, and the visualization style is inspired by their work.
  • Figure 2: Russian Community Baseline
  • Figure 3: Ukrainian Community Baseline
  • Figure 5: Macro average Jensen-Shannon divergence from organic baselines across all events and communities. Aligned conditions cluster at low divergences, while misaligned conditions show substantially higher divergences (0.300–0.435), confirming community-specific epistemic stance transfer. Error bars represent standard error across six event–community combinations.