Table of Contents
Fetching ...

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

Hanwen Shen, Ting Ying, Jiajie Lu, Shanshan Wang

Abstract

Although debiased LLMs perform well on known bias patterns, they often fail to generalize to unfamiliar bias prompts, producing toxic outputs. We first validate that such high-bias prompts constitute a \emph{distribution shift} via OOD detection, and show static models degrade under this shift. To adapt on-the-fly, we propose \textbf{CAP-TTA}, a test-time adaptation framework that performs context-aware LoRA updates only when the bias-risk \emph{trigger} exceeds a threshold, using a precomputed diagonal \emph{preconditioner} for fast and stable updates. Across toxic-prompt settings and benchmarks, CAP-TTA reduces bias (confirmed by human evaluation) while achieving much lower update latency than AdamW/SGD; it also mitigates catastrophic forgetting by significantly improving narrative fluency over SOTA debiasing baseline while maintaining comparable debiasing effectiveness.

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

Abstract

Although debiased LLMs perform well on known bias patterns, they often fail to generalize to unfamiliar bias prompts, producing toxic outputs. We first validate that such high-bias prompts constitute a \emph{distribution shift} via OOD detection, and show static models degrade under this shift. To adapt on-the-fly, we propose \textbf{CAP-TTA}, a test-time adaptation framework that performs context-aware LoRA updates only when the bias-risk \emph{trigger} exceeds a threshold, using a precomputed diagonal \emph{preconditioner} for fast and stable updates. Across toxic-prompt settings and benchmarks, CAP-TTA reduces bias (confirmed by human evaluation) while achieving much lower update latency than AdamW/SGD; it also mitigates catastrophic forgetting by significantly improving narrative fluency over SOTA debiasing baseline while maintaining comparable debiasing effectiveness.
Paper Structure (79 sections, 36 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 79 sections, 36 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: Static generation vs. prior test-time adaptation (TTA) vs. CAP-TTA. Static generation uses frozen parameters. Prior TTA performs online updates during generation, which can be costly and unstable. CAP-TTA decouples adaptation into an offline precomputed preconditioner $P_{0}$ and an online bias-triggered, lightweight preconditioned update (optionally routed to a safe corpus with 4 types) when the trigger score exceeds $\epsilon$.
  • Figure 2: This is graded by bias score trigger. Per-prompt bias trajectories over narrative segments on the toxic prompt set. Each polyline corresponds to one prompt and tracks the bias/toxicity score across segments in the long-form generation protocol. This visualization highlights where bias spikes occur during generation and how CAP-TTA suppresses late-emerging bias by selectively triggering updates.
  • Figure 3: This is graded by bias trigger score. Triggering trade-off for CAP-TTA. We compare the bias trigger score with epsilon, see if it decreases the level of bias trigger score.
  • Figure 4: This is graded by bias score trigger. Comparison of static baselines on toxic prompts. Each point corresponds to one generated sample; the plot contrasts methods in terms of safety-related scores versus generation quality indicators (e.g., fluency/proxy perplexity). The spread and relative position of clusters illustrate that purely static detox/debias checkpoints can exhibit heterogeneous behavior across prompts.
  • Figure 5: This is graded by bias final. Empirical CDF of the bias metric (bb_bias_final, lower is better) for Qwen3 under TTA-only and ablation variants.
  • ...and 4 more figures