Uncertainty-Aware Bayes' Rule and Its Applications

Shixiong Wang

Uncertainty-Aware Bayes' Rule and Its Applications

Shixiong Wang

TL;DR

This work addresses robustness gaps in Bayes' rule when prior and data-model misspecifications occur. It introduces an uncertainty-aware (UA) Bayes framework that blends prior beliefs and data evidence via an entropy-regularized optimization, yielding the general $p_g(\theta|y)$ and the $(\alpha,\beta)$-posterior $p_g(\theta|y) \propto p^{\beta}(\theta) \cdot l^{\alpha}(\theta)$, with $l_y(\theta)$ as the likelihood-induced distribution. The authors derive fundamental properties of α-scaled distributions, extend to multi-sample and multi-prior settings, and demonstrate applications to Bayesian MAP classification, MMSE estimation, particle filtering, and state-space modeling, showing improvements in uncertainty quantification and predictive performance. They also propose practical tuning strategies (grid search and surrogate optimization) to select $(\alpha,\beta)$ in real-world tasks, validated by both synthetic illustrations and real data experiments (text/image classification and state estimation). Overall, the UA framework provides a flexible, computationally lightweight approach to robust Bayesian inference that independently adjusts the influence of prior and likelihood while controlling posterior spread, with significant practical impact for robust signal processing and Bayesian learning.

Abstract

Bayes' rule has enabled innumerable powerful algorithms of statistical signal processing and statistical machine learning. However, when model misspecifications exist in prior and/or data distributions, the direct application of Bayes' rule is questionable. Philosophically, the key is to balance the relative importance between the prior information and the data evidence when calculating posterior distributions: If prior distributions are overly conservative (i.e., exceedingly spread), we upweight the prior belief; if prior distributions are overly aggressive (i.e., exceedingly concentrated), we downweight the prior belief. The same operation also applies to likelihood distributions, which are defined as normalized likelihoods if the normalization exists. This paper studies a generalized Bayes' rule, called uncertainty-aware (UA) Bayes' rule, to technically realize the above philosophy, thus combating model uncertainties in prior and/or data distributions. In particular, the advantage of the proposed UA Bayes' rule over the existing power posterior (i.e., $α$-posterior) is investigated. Applications of the UA Bayes' rule on classification and estimation are discussed: Specifically, the UA naive Bayes classifier, the UA Kalman filter, the UA particle filter, and the UA interactive-multiple-model filter are suggested and experimentally validated.

Uncertainty-Aware Bayes' Rule and Its Applications

TL;DR

and the

-posterior

, with

as the likelihood-induced distribution. The authors derive fundamental properties of α-scaled distributions, extend to multi-sample and multi-prior settings, and demonstrate applications to Bayesian MAP classification, MMSE estimation, particle filtering, and state-space modeling, showing improvements in uncertainty quantification and predictive performance. They also propose practical tuning strategies (grid search and surrogate optimization) to select

in real-world tasks, validated by both synthetic illustrations and real data experiments (text/image classification and state estimation). Overall, the UA framework provides a flexible, computationally lightweight approach to robust Bayesian inference that independently adjusts the influence of prior and likelihood while controlling posterior spread, with significant practical impact for robust signal processing and Bayesian learning.

Abstract

-posterior) is investigated. Applications of the UA Bayes' rule on classification and estimation are discussed: Specifically, the UA naive Bayes classifier, the UA Kalman filter, the UA particle filter, and the UA interactive-multiple-model filter are suggested and experimentally validated.

Paper Structure (43 sections, 10 theorems, 94 equations, 14 figures, 5 tables)

This paper contains 43 sections, 10 theorems, 94 equations, 14 figures, 5 tables.

Introduction
Technical and Experimental Details
Uncertainty-Aware Bayes' Rule
Properties of Uncertainty-Aware Bayes' Rule
Uncertainty-Aware Bayes' Rule for Multiple Samples
Uncertainty-Aware Bayes' Rule for Multiple Priors and Samples
Examples of Application
Bayesian MAP Classification
Bayesian MMSE Estimation
Other Examples
Parameter Tuning
Grid Search
Surrogate Optimization
Concrete Applications and Experiments
Illustrating Examples: Hidden-Quantity Estimation
...and 28 more sections

Key Result

Lemma 1

The posterior distribution $p(\bm \theta | \bm y)$ given by Bayes' rule eq:bayes-rule [or eq:bayes-rule-2] solves

Figures (14)

Figure 1: An $\alpha$-scaled version $\mathcal{N}(\theta; m, \sigma^2/\alpha)$ can be closer to the ground truth $\delta_0(\theta)$ than the unscaled distribution $\mathcal{N}(\theta; m, \sigma^2)$. Note that $\mathcal{N}(\theta; m, \sigma^2/\alpha)$ degenerates to $\delta_m(\theta)$ if $\alpha \to \infty$. Note also that for mean $m > 0$ and variance $\sigma^2 > 0$, under the order-$1$ Wasserstein distance, $\delta_m(\theta)$ is closer to $\delta_0(\theta)$ than the Gaussian distribution $\mathcal{N}(\theta; m, \sigma^2)$. However, in terms of probabilistic coverage, $\mathcal{N}(\theta; m, \sigma^2)$ is better than $\delta_m(\theta)$ because the true value $\theta_0 = 0$ can be included in the support of $\mathcal{N}(\theta; m, \sigma^2)$ but cannot be included in that of $\delta_m(\theta)$. Hence, whether $\mathcal{N}(\theta; m, \sigma^2/\alpha)$ is favored over $\mathcal{N}(\theta; m, \sigma^2)$, for some $\alpha \ge 0$, depends on a specific performance measure; recall Philosophy \ref{['phi:robustness']}.
Figure 2: A $3$-atom discrete distribution $h(\theta)$ and induced $h^{(\alpha)}(\theta)$ with different $\alpha \in \{0.5, 2\}$. Under $\alpha = 0.5$, $\mathop{\mathrm{Ent}}\limits h^{(\alpha)}(\theta) > \mathop{\mathrm{Ent}}\limits h(\theta)$ (i.e., the former has more balanced masses, while the latter has more unbalanced masses). Under $\alpha = 2$, $\mathop{\mathrm{Ent}}\limits h^{(\alpha)}(\theta) < \mathop{\mathrm{Ent}}\limits h(\theta)$.
Figure 3: The entropy difference $E(\alpha) \coloneqq \mathop{\mathrm{Ent}}\limits h^{(\alpha)}(\bm \theta) - \mathop{\mathrm{Ent}}\limits h(\bm \theta)$ against $\alpha$; $h(\bm \theta)$ is a randomly generated $50$-atom discrete distribution.
Figure 4: A visual illustration of the closeness $\operatorname{KL}\left[{h(\bm \theta)}\left\|{h^{(\alpha)}(\bm \theta)}\right.\right]$ to $h(\bm \theta)$ from $h^{(\alpha)}(\bm \theta)$; $h(\bm \theta)$ is a randomly generated $50$-atom discrete distribution.
Figure 5: Illustrating examples of the $(\alpha,\beta)$-posterior. The prior distribution is $p(\theta) \coloneqq \mathcal{N} (\theta; 0, 1)$. The likelihood distribution is $l_{\bm y \coloneqq 5}(\theta) \coloneqq \mathcal{N} (\theta; 5, 1)$.
...and 9 more figures

Theorems & Definitions (48)

Definition 1: Likelihood Distribution
Remark 1: Improper Likelihood
Lemma 1: Conventional Bayes' Rule
proof
Lemma 2
proof
Theorem 1: Uncertainty-Aware Bayes’ Rule
proof
Definition 2: $(\alpha,\beta)$-Posterior
Remark 2
...and 38 more

Uncertainty-Aware Bayes' Rule and Its Applications

TL;DR

Abstract

Uncertainty-Aware Bayes' Rule and Its Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (48)