A Quantitative Characterization of Forgetting in Post-Training

Krishnakumar Balasubramanian; Shiva Prasad Kasiviswanathan

A Quantitative Characterization of Forgetting in Post-Training

Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan

Abstract

Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develop theoretical results under a two-mode mixture abstraction (representing old and new tasks), proposed by Chen et al. (2025) (arXiv:2510.18874), and formalize forgetting in two forms: (i) mass forgetting, where the old mixture weight collapses to zero, and (ii) old-component drift, where an already-correct old component shifts during training. For equal-covariance Gaussian modes, we prove that forward-KL objectives trained on data from the new distribution drive the old weight to zero, while reverse-KL objectives converge to the true target (thereby avoiding mass forgetting) and perturb the old mean only through overlap-gated misassignment probabilities controlled by the Bhattacharyya coefficient, yielding drift that decays exponentially with mode separation and a locally well-conditioned geometry with exponential convergence. We further quantify how replay interacts with these objectives. For forward-KL, replay must modify the training distribution to change the population optimum; for reverse-KL, replay leaves the population objective unchanged but prevents finite-batch old-mode starvation through bounded importance weighting. Finally, we analyze three recently proposed near-on-policy post-training methods, SDFT (arxiv:2601.19897), TTT-Discover (arxiv:2601.16175), and OAPL (arxiv:2602.19362), via the same lens and derive explicit conditions under which each retains old mass and exhibits overlap-controlled drift. Overall, our results show that forgetting can by precisely quantified based on the interaction between divergence direction, geometric behavioral overlap, sampling regime, and the visibility of past behavior during training.

A Quantitative Characterization of Forgetting in Post-Training

Abstract

Paper Structure (57 sections, 28 theorems, 393 equations, 1 table)

This paper contains 57 sections, 28 theorems, 393 equations, 1 table.

Introduction
Effect of Replay on SFT and RL.
Near-on-policy Methods.
Intuition via Disjoint-support Case
Related Works
Forgetting in Forward and Reverse-KL Objectives
Forgetting in Two-component Mixture Model
Bhattacharyya Overlap and a Responsibility Bound.
Forward-KL SFT Exhibits Mass Forgetting
When Does Replay Prevent Forgetting under Forward-KL SFT?
Reverse-KL RL Avoids Mass Forgetting and Controls Old-Component Drift
A Local Exponential Rate for Reverse-KL Minimization
Replay Improves Reverse-KL Methods in Practice
Replay-Mixed Sampling with Bounded Importance Weights.
Summarizing Consequences for SFT and RL Post-Training
...and 42 more sections

Key Result

Lemma 1.1

Under the disjoint-support assumption,

Theorems & Definitions (69)

Definition 1.1
Lemma 1.1: Exact KL decompositions under disjoint supports
proof : Proof of Lemma \ref{['lem:disjoint-kl']}
Remark 1.1: Exact mode locality for shape parameters
Remark 1.2: Why Weights Can Still Collapse Under forward-KL
Definition 2.1: Mass Forgetting
Definition 2.2: $\varepsilon$-Bounded Drift of the Old Component
Lemma 2.1: Posterior Leakage bound via Bhattacharyya Coefficient
Remark 2.1: Bhattacharyya Coefficient for Equal-covariance Gaussians
Theorem 2.1: Mass Forgetting in Forward-KL SFT
...and 59 more

A Quantitative Characterization of Forgetting in Post-Training

Abstract

A Quantitative Characterization of Forgetting in Post-Training

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (69)