Table of Contents
Fetching ...

Entropy-regularized Gradient Estimators for Approximate Bayesian Inference

Jasmeet Kaur

TL;DR

This work addresses uncertainty quantification under data scarcity by tackling the scalability of Bayesian posterior sampling. It introduces Entropy-Regularized Gradient Descent (ERGD) and its variant s-ERGD to steer gradient flows toward the true posterior by regularizing with the cross-entropy between particle density and a kernel density, together with the KL divergence $D_{KL}( ho||\pi)$, under the Stein operator metric. ERGD augments particle-optimization variational inference (POVI) with kernel interactions to improve functional diversity and mitigate mode collapse, demonstrated on classification benchmarks and uncertainty-aware MB-RL planning. Results show competitive accuracy and uncertainty metrics, suggesting ERGD offers a scalable approach to uncertainty-aware learning in safety-critical applications.

Abstract

Effective uncertainty quantification is important for training modern predictive models with limited data, enhancing both accuracy and robustness. While Bayesian methods are effective for this purpose, they can be challenging to scale. When employing approximate Bayesian inference, ensuring the quality of samples from the posterior distribution in a computationally efficient manner is essential. This paper addresses the estimation of the Bayesian posterior to generate diverse samples by approximating the gradient flow of the Kullback-Leibler (KL) divergence and the cross entropy of the target approximation under the metric induced by the Stein Operator. It presents empirical evaluations on classification tasks to assess the method's performance and discuss its effectiveness for Model-Based Reinforcement Learning that uses uncertainty-aware network dynamics models.

Entropy-regularized Gradient Estimators for Approximate Bayesian Inference

TL;DR

This work addresses uncertainty quantification under data scarcity by tackling the scalability of Bayesian posterior sampling. It introduces Entropy-Regularized Gradient Descent (ERGD) and its variant s-ERGD to steer gradient flows toward the true posterior by regularizing with the cross-entropy between particle density and a kernel density, together with the KL divergence , under the Stein operator metric. ERGD augments particle-optimization variational inference (POVI) with kernel interactions to improve functional diversity and mitigate mode collapse, demonstrated on classification benchmarks and uncertainty-aware MB-RL planning. Results show competitive accuracy and uncertainty metrics, suggesting ERGD offers a scalable approach to uncertainty-aware learning in safety-critical applications.

Abstract

Effective uncertainty quantification is important for training modern predictive models with limited data, enhancing both accuracy and robustness. While Bayesian methods are effective for this purpose, they can be challenging to scale. When employing approximate Bayesian inference, ensuring the quality of samples from the posterior distribution in a computationally efficient manner is essential. This paper addresses the estimation of the Bayesian posterior to generate diverse samples by approximating the gradient flow of the Kullback-Leibler (KL) divergence and the cross entropy of the target approximation under the metric induced by the Stein Operator. It presents empirical evaluations on classification tasks to assess the method's performance and discuss its effectiveness for Model-Based Reinforcement Learning that uses uncertainty-aware network dynamics models.

Paper Structure

This paper contains 17 sections, 32 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: Starting from top row left - Initial Distribution, SVGD, kde-WGD, sse-WGD, SGD. Initial Distribution, ERGD with a linear and tanh schedule, ERGD with $s$-ERGD for an initial Distribution followed by a linear schedule and $s$-ERGD.
  • Figure 2: Comparing SGD with ERGD for Network Size
  • Figure 3: Mujoco Environments [todorov2012mujoco]