Table of Contents
Fetching ...

PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment

Chenning Xu, Mao Zheng, Mingyang Song

Abstract

Supervised fine-tuning (SFT) with token-level hard labels can amplify overconfident imitation of factually unsupported targets, causing hallucinations that propagate in multi-sentence generation. We study an augmented SFT setting in which training instances include coarse sentence-level factuality risk labels and inter-sentence dependency annotations, providing structured signals about where factual commitments are weakly supported. We propose \textbf{PRISM}, a differentiable risk-gated framework that modifies learning only at fact-critical positions. PRISM augments standard SFT with a lightweight, model-aware probability reallocation objective that penalizes high-confidence predictions on risky target tokens, with its scope controlled by span-level risk weights and model-aware gating. Experiments on hallucination-sensitive factual benchmarks and general evaluations show that PRISM improves factual aggregates across backbones while maintaining a competitive overall capability profile. Ablations further show that the auxiliary signal is most effective when used conservatively, and that knowledge masking and model-aware reallocation play complementary roles in balancing factual correction and capability preservation.

PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment

Abstract

Supervised fine-tuning (SFT) with token-level hard labels can amplify overconfident imitation of factually unsupported targets, causing hallucinations that propagate in multi-sentence generation. We study an augmented SFT setting in which training instances include coarse sentence-level factuality risk labels and inter-sentence dependency annotations, providing structured signals about where factual commitments are weakly supported. We propose \textbf{PRISM}, a differentiable risk-gated framework that modifies learning only at fact-critical positions. PRISM augments standard SFT with a lightweight, model-aware probability reallocation objective that penalizes high-confidence predictions on risky target tokens, with its scope controlled by span-level risk weights and model-aware gating. Experiments on hallucination-sensitive factual benchmarks and general evaluations show that PRISM improves factual aggregates across backbones while maintaining a competitive overall capability profile. Ablations further show that the auxiliary signal is most effective when used conservatively, and that knowledge masking and model-aware reallocation play complementary roles in balancing factual correction and capability preservation.

Paper Structure

This paper contains 35 sections, 15 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Averaged factual and common performance under different auxiliary weights $\lambda$, shown as $\Delta$ relative to the SFT baseline. The figure illustrates a trade-off between factual improvement and preservation of general capability.
  • Figure 2: Overall pipeline of our risk-gated knowledge-aware SFT (PRISM). Starting from instruction--response pairs, we extract atomic facts and fact relations, derive token-level fact spans and risk weights, and inject these signals into SFT via a model-aware, risk-gated probability reallocation objective.