Table of Contents
Fetching ...

Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize

Kart-Leong Lim, Xudong Jiang

TL;DR

This paper tackles the scalability of posterior approximation for Bayesian nonparametrics, focusing on Dirichlet process mixtures (DPM) where closed-form variational inference (VI) is limitsful for large datasets. It reformulates VI as a stochastic gradient ascent (SGA) problem and introduces two enhancements—momentum (SGA+M) and Fisher-information–based natural gradients (SGA+F)—to provide adaptive stepsizes and faster convergence. The proposed SGA-based learners are applied to DPM for unsupervised image classification, enabling learning with minibatches and high-dimensional deep features (e.g., FC7 from VGG16) while achieving on-par or better performance than batch VI in terms of NMI, ACC, and model selection, with substantial reductions in computational time. Empirical results on large-scale datasets such as Caltech101/256 and SUN397 demonstrate robust convergence, effective model selection, and strong scalability, highlighting the practical impact for scalable Bayesian nonparametrics in vision tasks, especially when deep feature representations are used. The method integrates $DPM$, $VI$, $SGA$, $F_{ heta}$, and natural gradients to achieve efficient posterior approximation in high-dimensional settings, offering a pathway toward broader adoption of BNPs in large-scale, real-world problems.

Abstract

Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397.

Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize

TL;DR

This paper tackles the scalability of posterior approximation for Bayesian nonparametrics, focusing on Dirichlet process mixtures (DPM) where closed-form variational inference (VI) is limitsful for large datasets. It reformulates VI as a stochastic gradient ascent (SGA) problem and introduces two enhancements—momentum (SGA+M) and Fisher-information–based natural gradients (SGA+F)—to provide adaptive stepsizes and faster convergence. The proposed SGA-based learners are applied to DPM for unsupervised image classification, enabling learning with minibatches and high-dimensional deep features (e.g., FC7 from VGG16) while achieving on-par or better performance than batch VI in terms of NMI, ACC, and model selection, with substantial reductions in computational time. Empirical results on large-scale datasets such as Caltech101/256 and SUN397 demonstrate robust convergence, effective model selection, and strong scalability, highlighting the practical impact for scalable Bayesian nonparametrics in vision tasks, especially when deep feature representations are used. The method integrates , , , , and natural gradients to achieve efficient posterior approximation in high-dimensional settings, offering a pathway toward broader adoption of BNPs in large-scale, real-world problems.

Abstract

Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397.

Paper Structure

This paper contains 33 sections, 31 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: CPU time (as a factor of MM) and minibatch size
  • Figure 2: NMI and Accuracy
  • Figure 3: Convergence plots of SGA (top) and SGA+Fisher (bot) on Caltech256
  • Figure 4: Convergence plots of SGA (top) and SGA+Fisher (bot) on SUN397