Table of Contents
Fetching ...

Variational Bayesian Pseudo-Coreset

Hyungi Lee, Seungyoo Lee, Juho Lee

TL;DR

Variational Bayesian Pseudo-Coreset (VBPC) tackles the high computational burden of posterior estimation in Bayesian neural networks by learning a small pseudo-coreset and performing last-layer variational inference. By using a Gaussian likelihood for the coreset inner problem, VBPC derives closed-form updates for the coreset parameters and avoids stop-gradient or weight-space sampling, enabling memory-efficient Bayesian model averaging. The framework employs a bilevel optimization with a model pool to train pseudo-coresets that generalize across feature maps, achieving strong ACC and notably improved NLL on several benchmark datasets and robustness under distribution shifts. Overall, VBPC reduces both memory and computation while preserving or enhancing uncertainty estimates, offering a practical approach for scalable Bayesian inference in deep learning.

Abstract

The success of deep learning requires large datasets and extensive training, which can create significant computational challenges. To address these challenges, pseudo-coresets, small learnable datasets that mimic the entire data, have been proposed. Bayesian Neural Networks, which offer predictive uncertainty and probabilistic interpretation for deep neural networks, also face issues with large-scale datasets due to their high-dimensional parameter space. Prior works on Bayesian Pseudo-Coresets (BPC) attempt to reduce the computational load for computing weight posterior distribution by a small number of pseudo-coresets but suffer from memory inefficiency during BPC training and sub-optimal results. To overcome these limitations, we propose Variational Bayesian Pseudo-Coreset (VBPC), a novel approach that utilizes variational inference to efficiently approximate the posterior distribution, reducing memory usage and computational costs while improving performance across benchmark datasets.

Variational Bayesian Pseudo-Coreset

TL;DR

Variational Bayesian Pseudo-Coreset (VBPC) tackles the high computational burden of posterior estimation in Bayesian neural networks by learning a small pseudo-coreset and performing last-layer variational inference. By using a Gaussian likelihood for the coreset inner problem, VBPC derives closed-form updates for the coreset parameters and avoids stop-gradient or weight-space sampling, enabling memory-efficient Bayesian model averaging. The framework employs a bilevel optimization with a model pool to train pseudo-coresets that generalize across feature maps, achieving strong ACC and notably improved NLL on several benchmark datasets and robustness under distribution shifts. Overall, VBPC reduces both memory and computation while preserving or enhancing uncertainty estimates, offering a practical approach for scalable Bayesian inference in deep learning.

Abstract

The success of deep learning requires large datasets and extensive training, which can create significant computational challenges. To address these challenges, pseudo-coresets, small learnable datasets that mimic the entire data, have been proposed. Bayesian Neural Networks, which offer predictive uncertainty and probabilistic interpretation for deep neural networks, also face issues with large-scale datasets due to their high-dimensional parameter space. Prior works on Bayesian Pseudo-Coresets (BPC) attempt to reduce the computational load for computing weight posterior distribution by a small number of pseudo-coresets but suffer from memory inefficiency during BPC training and sub-optimal results. To overcome these limitations, we propose Variational Bayesian Pseudo-Coreset (VBPC), a novel approach that utilizes variational inference to efficiently approximate the posterior distribution, reducing memory usage and computational costs while improving performance across benchmark datasets.

Paper Structure

This paper contains 76 sections, 42 equations, 16 figures, 20 tables, 2 algorithms.

Figures (16)

  • Figure 1: Learned VBPC images for the Fashion-MNIST (ipc=10; left), CIFAR10 (ipc=10; middle) and CIFAR100 (ipc=1; right) cases. These images construct trained mean for the distribution ${\mathcal{S}}^*$.
  • Figure 2: Learned VBPC images from the random initialization for the CIFAR10 ipc 1 (above) and ipc 10 (below) cases. The left figure shows the random images sampled from the uniform distribution and the right figure shows the trained VBPC images starting from the left images. Training from random initialization successfully learns semantic information from the full dataset.
  • Figure 3: Learned VBPC images from the randomly sampled image from the original training dataset for the CIFAR10 ipc 1 (above) and ipc 10 (below) cases. The left figure shows the initial images sampled from the original dataset and the right figure show the final learned VBPC starting from the left images.
  • Figure 4: Visualization of learned VBPC images utilizing different optimizers for the CIFAR10 ipc 1 (above) and ipc 10 (below). The left figure shows the learned VBPC images with LAMB optimizer and the right figure shows the learned VBPC images with Adam optimizer.
  • Figure 5: Learned VBPC images utilizing different maximum updates steps for the model pool elements in the CIFAR100 ipc 10 experiment. The left figure shows the $T=100$ case which is the default setting for the all experiments. The middle and the right figures show the $T=200$ and $T=400$ cases. The learned images show minor difference in visual.
  • ...and 11 more figures