Optimal information deletion and Bayes' theorem
Hans Montcho, Håvard Rue
TL;DR
The paper reframes Bayes' theorem as the optimal information processing rule for both learning and unlearning, by formulating an information deletion problem that removes the information content of a data subset. Using a variational calculus approach, the authors derive the optimal information deletion rule, showing it equals the leave-data-out posterior computed via Bayes' theorem, $\pi(\theta|y_{-g})$, and establish a duality between learning and unlearning. This result provides a principled, variational pathway to obtain leave-data-out posteriors and supports Bayesian unlearning via approximate methods. The work also discusses extensions to alternative information measures and loss functions, and highlights practical implications for cross-validation and data deletion tasks. $\pi(\theta|y_{-g})$ is thus shown to be the exact, information-preserving outcome of data removal, reinforcing the foundational role of Bayes' theorem in both updating and unlearning.
Abstract
In this same journal, Arnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule. This result led to the variational formulation of Bayes' theorem, which is the central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules which update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule which does not destroy or create information is called the optimal information deletion rule and we prove that it coincides with the traditional use of Bayes' theorem.
