Accelerating Convergence in Bayesian Few-Shot Classification

Tianjun Ke; Haoqun Cao; Feng Zhou

Accelerating Convergence in Bayesian Few-Shot Classification

Tianjun Ke, Haoqun Cao, Feng Zhou

TL;DR

This paper tackles non-conjugate inference in Bayesian Gaussian-process-based few-shot classification by introducing Mirror Descent-based Variational Inference (MD-VI). It recasts VI updates on GP FSC into conjugate-like steps that exploit non-Euclidean geometry, achieving faster inner-loop convergence and parameterization invariance while preserving uncertainty quantification. The authors present a bi-level framework where task-specific VI updates occur in the inner loop and a deep-kernel GP prior is learned in the outer loop, with theoretical equivalence to natural gradient methods and a conjugate Bayesian interpretation. Empirical results show competitive accuracy and improved calibration across multiple FSC benchmarks, with systematic analysis of hyperparameters and convergence behavior. Overall, MD-BFSC offers a principled, efficient alternative for Bayesian meta-learning in few-shot classification, aided by publicly available code.

Abstract

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.

Accelerating Convergence in Bayesian Few-Shot Classification

TL;DR

Abstract

Paper Structure (34 sections, 2 theorems, 23 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 2 theorems, 23 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Background
Few-shot Classification
Gaussian Process Classification with Variational Inference
Exponential Family
Variational Inference with Natural Gradient Descent
Variational Inference with Mirror Descent
Methodology
From Natural Gradient Descent to Mirror Descent
From Mirror Descent to Conjugate Bayesian Inference
Algorithm
Experiments
Accuracy
Uncertainty Quantification
Convergence Rate
...and 19 more sections

Key Result

Theorem 3.1

Given two parameterized ELBO with mean parameter and natural parameter $\widetilde{\mathcal{L}}(\bm{\mu})=\mathcal{L}(\bm{\theta})$, to maximize the ELBO, the mirror descent over the mean parameter using the Bregman divergence bregman1967relaxation$B_H(\bm{\mu},\bm{\mu}_t)=H(\bm{\mu})-H(\bm{\mu}_t)-\nabla H(\bm{\mu}_t)^\top(\bm{\mu}-\bm{\mu}_t)$ induced by the negative entropy function $H(\cdot)$

Figures (4)

Figure 1: The overview of the training process of MD-BSFC. The diagram illustrates the bi-level optimization process involving an iterative application of mirror descent for VI and hyperparameter tuning.
Figure 2: Reliability diagrams on 5-shot classification with ECE and MCE metrics. MI denotes mini-ImageNet and Omni denotes Omniglot. Results are computed on 3,000 test tasks.
Figure 3: The ELBO curve of 30 iterations on the CUB dataset and the mini-ImageNet dataset for 1-shot and 5-shot classification. Our method with mirror descent (MD) learns at a faster rate than the vanilla method with gradient descent (GD) in both scenarios.
Figure 4: The loss curve of 30 iterations on the CUB dataset and the mini-ImageNet dataset for 1-shot and 5-shot classification. Our method with mirror descent (MD) learns at a faster rate than the vanilla method with gradient descent (GD) in both scenarios.

Theorems & Definitions (4)

Theorem 3.1: raskutti2015information
Theorem 3.2: khan2017conjugate
proof
proof

Accelerating Convergence in Bayesian Few-Shot Classification

TL;DR

Abstract

Accelerating Convergence in Bayesian Few-Shot Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (4)