SENet: A Spectral Filtering Approach to Represent Exemplars for Few-shot Learning

Tao Zhang; Wu Huang

SENet: A Spectral Filtering Approach to Represent Exemplars for Few-shot Learning

Tao Zhang, Wu Huang

TL;DR

A shrinkage exemplar loss is proposed to replace the widely used cross entropy loss for capturing the information of individual shrinkage samples for few-shot classification.

Abstract

Prototype is widely used to represent internal structure of category for few-shot learning, which was proposed as a simple inductive bias to address the issue of overfitting. However, since prototype representation is normally averaged from individual samples, it can appropriately to represent some classes but with underfitting to represent some others that can be batter represented by exemplars. To address this problem, in this work, we propose Shrinkage Exemplar Networks (SENet) for few-shot classification. In SENet, categories are represented by the embedding of samples that shrink towards their mean via spectral filtering. Furthermore, a shrinkage exemplar loss is proposed to replace the widely used cross entropy loss for capturing the information of individual shrinkage samples. Several experiments were conducted on miniImageNet, tiered-ImageNet and CIFAR-FS datasets. The experimental results demonstrate the effectiveness of our proposed method.

SENet: A Spectral Filtering Approach to Represent Exemplars for Few-shot Learning

TL;DR

A shrinkage exemplar loss is proposed to replace the widely used cross entropy loss for capturing the information of individual shrinkage samples for few-shot classification.

Abstract

Paper Structure (24 sections, 14 equations, 4 figures, 6 tables)

This paper contains 24 sections, 14 equations, 4 figures, 6 tables.

Introduction
Related Work
Few-shot Learning using Metric-based Model.
Shrinkage Estimators.
Loss Objective.
Methodology
Preliminary
Prototype-based predictors.
Prototype-extended predictors.
Exemplar-based predictors.
Shrinkage Exemplar
Temperature.
Connection to Exemplar-based and Prototype-extended Predictors
Experiments
Experimental Setup
...and 9 more sections

Figures (4)

Figure 1: Comparison of original distribution (left) to shrinkage distribution (right) for a 3-shot task in 2-dimension. Two categories including the cardboard box category (red) and the guide-board category (green) are considered, and the queries are annotated with blue box. In the original distribution, the query belonging to cardboard box category should be predicted with the prototype model and that belonging to the guide-board category should be predicted with the exemplar model. In the shrinkage distribution, all the samples belonging to the same class shrink properly toward their mean. This case allow us to make a prediction uniformly via the similarities between queries and the shrinkage samples.
Figure 2: Comparison between three kinds of predictors in the case of 3-shot and 3 dimension. (a) Exemplar: no filtering on the support samples $s_1$, $s_2$, $s_3$ and the query samples $q$. (b) Filtering: $s_1$, $s_2$, $s_3$ and $q$ shrink towards the mean value of $s_1$, $s_2$, $s_3$ after spectral filtering. (c) Prototype: $s_1$, $s_2$, $s_3$ and the projection of $q$ on the subspace spanned by these support samples arrive the mean value after extreme filtering.
Figure 3: The robustness comparison of the prototype model ($\lambda \to \infty$), the exemplar model ($\lambda=0$) and the proposed SENet ($\lambda=10^5$) on CIFAR-FS dataset. We shows accuracies achieved by (a) $d^{(s_1)}_{\lambda}$ and (b) $d^{(s_2)}_{\lambda}$ compered with the other two models respectively with different episodes per batch, and accuracies achieved by (c) $d^{(s_1)}_{\lambda}$ and (d) $d^{(s_2)}_{\lambda}$ compered respectively with the other two models on Gaussian noise with different variances (the mean is zero) on 8 episodes per batch.
Figure 4: The comparison of performance of the prototype model ($\lambda \to \infty$), the exemplar model ($\lambda=0$) and the proposed SENet with (a) $\lambda = 1000$ for $d^{(s_1)}_\lambda$ and (b) $\lambda = 10000$ for $d^{(s_2)}_\lambda$ with 2 episodes per batch on CIFAR-FS dataset.

SENet: A Spectral Filtering Approach to Represent Exemplars for Few-shot Learning

TL;DR

Abstract

SENet: A Spectral Filtering Approach to Represent Exemplars for Few-shot Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)