Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

Tianjun Ke; Haoqun Cao; Zenan Ling; Feng Zhou

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

Tianjun Ke, Haoqun Cao, Zenan Ling, Feng Zhou

TL;DR

The logistic-softmax likelihood is revisit and redesigned, which enables control of the \textit{a priori} confidence level through a temperature parameter, and it is theoretically and empirically show that softmax can be viewed as a special case of Logistic- softmax and logistic -softmax induces a larger family of data distribution than softmax.

Abstract

Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification due to its conditional conjugacy property. However, the theoretical property of logistic-softmax is not clear and previous research indicated that the inherent uncertainty of logistic-softmax leads to suboptimal performance. To mitigate these issues, we revisit and redesign the logistic-softmax likelihood, which enables control of the \textit{a priori} confidence level through a temperature parameter. Furthermore, we theoretically and empirically show that softmax can be viewed as a special case of logistic-softmax and logistic-softmax induces a larger family of data distribution than softmax. Utilizing modified logistic-softmax, we integrate the data augmentation technique into the deep kernel based Gaussian process meta-learning framework, and derive an analytical mean-field approximation for task-specific updates. Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets. Code is publicly available at \url{https://github.com/keanson/revisit-logistic-softmax}.

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

TL;DR

Abstract

Paper Structure (35 sections, 3 theorems, 45 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 35 sections, 3 theorems, 45 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries
Logistic-softmax with Temperature
Definition of Logistic-softmax with Temperature
Comparison of Logistic-softmax and Softmax with Temperature
Further Discussions of Potential Applications
Logistic-softmax with Temperature in Bayesian Meta-learning
Framework of Bayesian Meta-learning
Task-level Bayesian Inference
Mean-field Approximation
Meta-level Optimization
Prediction
Experiments
Few-shot Classification and Domain Transfer
Uncertainty Quantification
...and 20 more sections

Key Result

Theorem 3.1

Denote the logistic-softmax function with temperature as $\operatorname{LS}(\mathbf{f}_n, \tau)$. Define $I := \{ i : f_n^i > 0 \} \subset [C]$, we have where $\bm{e}_c \in \mathbb{R}^C$ is the one-hot vector with a $1$ in its $c$-th coordinate.

Figures (5)

Figure 1: Diagram representing the features and relationship of logistic-softmax and softmax.
Figure 1: Confidence ($\max _c p(y=c \mid \mathbf{f})$) histogram and kernel density estimate for randomly generated function samples $f_c \sim \mathcal{N}(-5,1)$. Output probabilities are normalized for $C = 5$.
Figure 2: Plot of $p(y = 1 | \mathbf{f})$ where $f_3$ clamped to $-100$. We provide separate zoom-in plots of softmax and logistic-softmax in the 2nd row. In the upper-right area (where all $f_1$ and $f_2$ are greater than $0$), the logistic-softmax function exhibits unique probability patterns that softmax cannot model. In the bottom-left area (where all $f_1$ and $f_2$ are smaller than $0$), logistic-softmax accurately approximates softmax for every temperature and every location all at once.
Figure 2: Lineplots of average 1-shot accuracy and standard deviation on 5-way few-shot classification for different steps. We use the exact same experiment settings for all steps. Results are evaluated over 5 batches of 600 episodes with different random seeds.
Figure 3: Reliability diagrams on 5-shot classification with expected calibration error (ECE) and maximum calibration error (MCE) metrics. Mini denotes the mini-ImageNet dataset, and DT denotes the domain transfer task of CUB $\rightarrow$ mini-ImageNet. Results are computed on 3,000 test tasks.

Theorems & Definitions (3)

Theorem 3.1
Theorem 3.2
Theorem 3.3

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

TL;DR

Abstract

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)