Table of Contents
Fetching ...

Learn by Surprise, Commit by Proof

Kang-Sin Choi

Abstract

We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, generates a Q&A chain that forces the model to articulate its own knowledge and identify gaps, then adjusts AdamW's $β_2$ proportionally to conviction depth k (the number of self-verification steps the passage survives) via $β_2 = 0.999 \cdot r^k$. The entire learning intensity is governed by a single parameter $r$. Beyond new knowledge, this process sharpens weakly encoded existing knowledge, which is a primary source of hallucination. The framework is self-extinguishing: as the model learns, per-token loss on learned passages decreases toward the surprisal threshold and the system progressively converges to standard AdamW. This models biological memory consolidation: temporary information in the context window is selectively consolidated into parametric weights, the model's long-term memory. Experiments on the reference model (Qwen3-14B) and across six models (8B--32B, four families) show that standard fine-tuning produces rote memorization (perturbation gap (the ratio of paraphrase to original perplexity) of 11.6 +- 0.2 x baseline) while all LSCP conditions learn semantically (2.7--3.0x). The r=1.0 condition (identical optimizer, nearly identical data, only Q&A format differs) confirms that the training data format, not $β_2$ gating, is the primary mechanism preventing memorization; gating instead protects neighboring knowledge from contamination by corrupt content (93 +- 7% accuracy on adjacent questions at r=0.98 vs. 90% baseline).

Learn by Surprise, Commit by Proof

Abstract

We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, generates a Q&A chain that forces the model to articulate its own knowledge and identify gaps, then adjusts AdamW's proportionally to conviction depth k (the number of self-verification steps the passage survives) via . The entire learning intensity is governed by a single parameter . Beyond new knowledge, this process sharpens weakly encoded existing knowledge, which is a primary source of hallucination. The framework is self-extinguishing: as the model learns, per-token loss on learned passages decreases toward the surprisal threshold and the system progressively converges to standard AdamW. This models biological memory consolidation: temporary information in the context window is selectively consolidated into parametric weights, the model's long-term memory. Experiments on the reference model (Qwen3-14B) and across six models (8B--32B, four families) show that standard fine-tuning produces rote memorization (perturbation gap (the ratio of paraphrase to original perplexity) of 11.6 +- 0.2 x baseline) while all LSCP conditions learn semantically (2.7--3.0x). The r=1.0 condition (identical optimizer, nearly identical data, only Q&A format differs) confirms that the training data format, not gating, is the primary mechanism preventing memorization; gating instead protects neighboring knowledge from contamination by corrupt content (93 +- 7% accuracy on adjacent questions at r=0.98 vs. 90% baseline).

Paper Structure

This paper contains 49 sections, 7 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: LSCP pipeline overview. Stage 1 detects surprising passages via passage-level surprisal. Stage 2 generates Q&A pairs, checks consistency against existing knowledge, grades each passage by conviction depth $k$, and annotates epistemic boundaries. Stage 3 adjusts AdamW's $\beta_2$ proportionally to $k$, selectively opening the Variance Lock for verified content.
  • Figure 2: Perturbation gap by condition under graduated accept ($n=5$). Normal produces a gap of $11.6\times$ baseline (rote memorization). All LSCP conditions produce gaps near baseline (semantic learning). Error bars: $\pm 1$ std.
  • Figure 3: Effective $\beta_2 = 0.999 \cdot r^k$ by decay factor $r$ and conviction depth $k$. Green: AdamW regime ($\beta_2 > 0.85$). Yellow: boundary ($0.5 < \beta_2 < 0.85$). Orange: SGD-like ($\beta_2 < 0.5$). The optimal $r=0.98$ keeps most items in the green zone.
  • Figure 4: Statistical significance of $\beta_2$ gating across $r$ values ($n=5$ per condition; $r=0.98$: $n=6$). (A) Corrupt-adjacent accuracy: $r=0.98$ achieves the best protection ($93 \pm 7\%$), above baseline ($90\%$). (B) PPL reduction increases monotonically with stronger gating while unrelated accuracy remains near $100\%$, confirming no catastrophic forgetting. Error bars: $\pm 1$ std.