Jeffrey's update rule as a minimizer of Kullback-Leibler divergence
Carlos Pinzón, Catuscia Palamidessi
TL;DR
This work provides a concise, high-level proof that Jeffrey's update rule minimizes or preserves the KL divergence $D_{\text{KL}}\left(\tau\,||\,\overrightarrow{C}(\theta)\right)$ during Bayesian updates of the parameter $\theta$. By decomposing the log-likelihood into $L(\theta)=Q(\theta|\theta_t)+H(\theta|\theta_t)$ and leveraging an EM-style argument, the authors show that the Jeffrey update $\theta_{t+1}=\overleftarrow{C_{\theta_t}}(\tau)$ maximizes $Q$ and that the nonnegative Gibbs term $\Delta H$ ensures a nonnegative $\Delta L$, which equivalently reduces the KL divergence after the update. The paper extends the argument to full-image constraints and sparsity, demonstrating that the Jeffrey posterior remains well-defined under mild positivity conditions. Overall, the result offers a streamlined, accessible proof that strengthens the theoretical understanding of Jeffrey's rule within Bayesian learning and EM frameworks. $\,$
Abstract
In this paper, we show a more concise and high level proof than the original one, derived by researcher Bart Jacobs, for the following theorem: in the context of Bayesian update rules for learning or updating internal states that produce predictions, the relative entropy between the observations and the predictions is reduced when applying Jeffrey's update rule to update the internal state.
