Optimal information deletion and Bayes' theorem

Hans Montcho; Håvard Rue

Optimal information deletion and Bayes' theorem

Hans Montcho, Håvard Rue

TL;DR

The paper reframes Bayes' theorem as the optimal information processing rule for both learning and unlearning, by formulating an information deletion problem that removes the information content of a data subset. Using a variational calculus approach, the authors derive the optimal information deletion rule, showing it equals the leave-data-out posterior computed via Bayes' theorem, $\pi(\theta|y_{-g})$, and establish a duality between learning and unlearning. This result provides a principled, variational pathway to obtain leave-data-out posteriors and supports Bayesian unlearning via approximate methods. The work also discusses extensions to alternative information measures and loss functions, and highlights practical implications for cross-validation and data deletion tasks. $\pi(\theta|y_{-g})$ is thus shown to be the exact, information-preserving outcome of data removal, reinforcing the foundational role of Bayes' theorem in both updating and unlearning.

Abstract

In this same journal, Arnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule. This result led to the variational formulation of Bayes' theorem, which is the central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules which update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule which does not destroy or create information is called the optimal information deletion rule and we prove that it coincides with the traditional use of Bayes' theorem.

Optimal information deletion and Bayes' theorem

TL;DR

, and establish a duality between learning and unlearning. This result provides a principled, variational pathway to obtain leave-data-out posteriors and supports Bayesian unlearning via approximate methods. The work also discusses extensions to alternative information measures and loss functions, and highlights practical implications for cross-validation and data deletion tasks.

is thus shown to be the exact, information-preserving outcome of data removal, reinforcing the foundational role of Bayes' theorem in both updating and unlearning.

Abstract

Paper Structure (11 sections, 2 theorems, 20 equations, 3 figures)

This paper contains 11 sections, 2 theorems, 20 equations, 3 figures.

Introduction
Optimal information processing rule
Information concept and optimal information deletion rule
Derivation of the optimal information deletion rule
Equivalence between the optimal Information deletion rule and Bayes' theorem
Discussion
Appendix
Proof of Theorem 1
Proof of the optimality of information deletion rule
Proof of theorem 2
On the conditional independence assumption

Key Result

Theorem 1

Under the previous assumptions, the minimum of the functional $\Delta[q(\theta|y_{-g})]$ is given by:

Figures (3)

Figure 1: Illustration of the Information Processing Rule
Figure 2: Illustration of the Information Deletion Rule. The interest lies in deleting the information contribution of $y_g$ from the posterior distribution $\pi(\theta|y)$.
Figure 3: The figure illustrates the role of $\pi(\theta|y_{-g})$ as prior or posterior distribution.

Theorems & Definitions (2)

Theorem 1: Optimal information deletion rule
Theorem 2: Equivalence of the IDR and Bayes' theorem

Optimal information deletion and Bayes' theorem

TL;DR

Abstract

Optimal information deletion and Bayes' theorem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)