Table of Contents
Fetching ...

Optimal information deletion and Bayes' theorem

Hans Montcho, Håvard Rue

TL;DR

The paper reframes Bayes' theorem as the optimal information processing rule for both learning and unlearning, by formulating an information deletion problem that removes the information content of a data subset. Using a variational calculus approach, the authors derive the optimal information deletion rule, showing it equals the leave-data-out posterior computed via Bayes' theorem, $\pi(\theta|y_{-g})$, and establish a duality between learning and unlearning. This result provides a principled, variational pathway to obtain leave-data-out posteriors and supports Bayesian unlearning via approximate methods. The work also discusses extensions to alternative information measures and loss functions, and highlights practical implications for cross-validation and data deletion tasks. $\pi(\theta|y_{-g})$ is thus shown to be the exact, information-preserving outcome of data removal, reinforcing the foundational role of Bayes' theorem in both updating and unlearning.

Abstract

In this same journal, Arnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule. This result led to the variational formulation of Bayes' theorem, which is the central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules which update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule which does not destroy or create information is called the optimal information deletion rule and we prove that it coincides with the traditional use of Bayes' theorem.

Optimal information deletion and Bayes' theorem

TL;DR

The paper reframes Bayes' theorem as the optimal information processing rule for both learning and unlearning, by formulating an information deletion problem that removes the information content of a data subset. Using a variational calculus approach, the authors derive the optimal information deletion rule, showing it equals the leave-data-out posterior computed via Bayes' theorem, , and establish a duality between learning and unlearning. This result provides a principled, variational pathway to obtain leave-data-out posteriors and supports Bayesian unlearning via approximate methods. The work also discusses extensions to alternative information measures and loss functions, and highlights practical implications for cross-validation and data deletion tasks. is thus shown to be the exact, information-preserving outcome of data removal, reinforcing the foundational role of Bayes' theorem in both updating and unlearning.

Abstract

In this same journal, Arnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule. This result led to the variational formulation of Bayes' theorem, which is the central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules which update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule which does not destroy or create information is called the optimal information deletion rule and we prove that it coincides with the traditional use of Bayes' theorem.
Paper Structure (11 sections, 2 theorems, 20 equations, 3 figures)

This paper contains 11 sections, 2 theorems, 20 equations, 3 figures.

Key Result

Theorem 1

Under the previous assumptions, the minimum of the functional $\Delta[q(\theta|y_{-g})]$ is given by:

Figures (3)

  • Figure 1: Illustration of the Information Processing Rule
  • Figure 2: Illustration of the Information Deletion Rule. The interest lies in deleting the information contribution of $y_g$ from the posterior distribution $\pi(\theta|y)$.
  • Figure 3: The figure illustrates the role of $\pi(\theta|y_{-g})$ as prior or posterior distribution.

Theorems & Definitions (2)

  • Theorem 1: Optimal information deletion rule
  • Theorem 2: Equivalence of the IDR and Bayes' theorem