Table of Contents
Fetching ...

Accelerating Convergence in Bayesian Few-Shot Classification

Tianjun Ke, Haoqun Cao, Feng Zhou

TL;DR

This paper tackles non-conjugate inference in Bayesian Gaussian-process-based few-shot classification by introducing Mirror Descent-based Variational Inference (MD-VI). It recasts VI updates on GP FSC into conjugate-like steps that exploit non-Euclidean geometry, achieving faster inner-loop convergence and parameterization invariance while preserving uncertainty quantification. The authors present a bi-level framework where task-specific VI updates occur in the inner loop and a deep-kernel GP prior is learned in the outer loop, with theoretical equivalence to natural gradient methods and a conjugate Bayesian interpretation. Empirical results show competitive accuracy and improved calibration across multiple FSC benchmarks, with systematic analysis of hyperparameters and convergence behavior. Overall, MD-BFSC offers a principled, efficient alternative for Bayesian meta-learning in few-shot classification, aided by publicly available code.

Abstract

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.

Accelerating Convergence in Bayesian Few-Shot Classification

TL;DR

This paper tackles non-conjugate inference in Bayesian Gaussian-process-based few-shot classification by introducing Mirror Descent-based Variational Inference (MD-VI). It recasts VI updates on GP FSC into conjugate-like steps that exploit non-Euclidean geometry, achieving faster inner-loop convergence and parameterization invariance while preserving uncertainty quantification. The authors present a bi-level framework where task-specific VI updates occur in the inner loop and a deep-kernel GP prior is learned in the outer loop, with theoretical equivalence to natural gradient methods and a conjugate Bayesian interpretation. Empirical results show competitive accuracy and improved calibration across multiple FSC benchmarks, with systematic analysis of hyperparameters and convergence behavior. Overall, MD-BFSC offers a principled, efficient alternative for Bayesian meta-learning in few-shot classification, aided by publicly available code.

Abstract

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.
Paper Structure (34 sections, 2 theorems, 23 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 2 theorems, 23 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

Given two parameterized ELBO with mean parameter and natural parameter $\widetilde{\mathcal{L}}(\bm{\mu})=\mathcal{L}(\bm{\theta})$, to maximize the ELBO, the mirror descent over the mean parameter using the Bregman divergence bregman1967relaxation$B_H(\bm{\mu},\bm{\mu}_t)=H(\bm{\mu})-H(\bm{\mu}_t)-\nabla H(\bm{\mu}_t)^\top(\bm{\mu}-\bm{\mu}_t)$ induced by the negative entropy function $H(\cdot)$

Figures (4)

  • Figure 1: The overview of the training process of MD-BSFC. The diagram illustrates the bi-level optimization process involving an iterative application of mirror descent for VI and hyperparameter tuning.
  • Figure 2: Reliability diagrams on 5-shot classification with ECE and MCE metrics. MI denotes mini-ImageNet and Omni denotes Omniglot. Results are computed on 3,000 test tasks.
  • Figure 3: The ELBO curve of 30 iterations on the CUB dataset and the mini-ImageNet dataset for 1-shot and 5-shot classification. Our method with mirror descent (MD) learns at a faster rate than the vanilla method with gradient descent (GD) in both scenarios.
  • Figure 4: The loss curve of 30 iterations on the CUB dataset and the mini-ImageNet dataset for 1-shot and 5-shot classification. Our method with mirror descent (MD) learns at a faster rate than the vanilla method with gradient descent (GD) in both scenarios.

Theorems & Definitions (4)

  • Theorem 3.1: raskutti2015information
  • Theorem 3.2: khan2017conjugate
  • proof
  • proof