Knowledge Adaptation as Posterior Correction
Mohammad Emtiyaz Khan
TL;DR
The paper introduces posterior correction as a unifying framework for rapid knowledge adaptation across continual learning, unlearning, model merging, and federated learning. By recasting adaptation as updating old posterior approximations via a correction term derived from the Bayesian Learning Rule, it shows that many existing methods are special cases of this principle and that richer posteriors reduce the required corrections. The work provides a spectrum of concrete instantiations (isotropic, diagonal, and full Gaussian posteriors) and connects regularization, prediction matching, and influence estimation to posterior-correction terms, including Memory Replay and K-priors. Together, these insights offer a principled path to design faster, more reliable, and scalable adaptive algorithms for sequential and distributed learning tasks.
Abstract
Adaptation is the holy grail of intelligence, but even the best AI models lack the adaptability of toddlers. In spite of great progress, little is known about the mechanisms by which machines can learn to adapt as fast as humans and animals. Here, we cast adaptation as `correction' of old posteriors and show that a wide-variety of existing adaptation methods follow this very principle, including those used for continual learning, federated learning, unlearning, and model merging. In all these settings, more accurate posteriors often lead to smaller corrections and can enable faster adaptation. Posterior correction is derived by using the dual representation of the Bayesian Learning Rule of Khan and Rue (2023), where the interference between the old representation and new information is quantified by using the natural-gradient mismatch. We present many examples demonstrating how machines can learn to adapt quickly by using posterior correction.
