Table of Contents
Fetching ...

tSF: Transformer-based Semantic Filter for Few-Shot Learning

Jinxiang Lai, Siqian Yang, Wenlong Liu, Yi Zeng, Zhongyi Huang, Wenlong Wu, Jun Liu, Bin-Bin Gao, Chengjie Wang

TL;DR

Few-shot learning suffers from weak representations for novel classes due to limited data. The authors propose a light transformer-based Semantic Filter (tSF) that redefines the Transformer inputs as $Q=f$ and $K,V=\theta$, enabling dataset-attention to encode base-set semantics into a learnable filter and transfer them to novel samples via $f' = FFN(f + \sigma(f \theta^T) \theta)$. tSF serves as a universal neck and is integrated with PatchProto for few-shot classification, while also generalizing to segmentation and detection, delivering about 2–3% gains on benchmarks with a parameter footprint under 1M. This approach demonstrates robust base-to-novel transfer and task-agnostic improvements, offering a practical, scalable solution for cross-task few-shot learning.

Abstract

Few-Shot Learning (FSL) alleviates the data shortage challenge via embedding discriminative target-aware features among plenty seen (base) and few unseen (novel) labeled samples. Most feature embedding modules in recent FSL methods are specially designed for corresponding learning tasks (e.g., classification, segmentation, and object detection), which limits the utility of embedding features. To this end, we propose a light and universal module named transformer-based Semantic Filter (tSF), which can be applied for different FSL tasks. The proposed tSF redesigns the inputs of a transformer-based structure by a semantic filter, which not only embeds the knowledge from whole base set to novel set but also filters semantic features for target category. Furthermore, the parameters of tSF is equal to half of a standard transformer block (less than 1M). In the experiments, our tSF is able to boost the performances in different classic few-shot learning tasks (about 2% improvement), especially outperforms the state-of-the-arts on multiple benchmark datasets in few-shot classification task.

tSF: Transformer-based Semantic Filter for Few-Shot Learning

TL;DR

Few-shot learning suffers from weak representations for novel classes due to limited data. The authors propose a light transformer-based Semantic Filter (tSF) that redefines the Transformer inputs as and , enabling dataset-attention to encode base-set semantics into a learnable filter and transfer them to novel samples via . tSF serves as a universal neck and is integrated with PatchProto for few-shot classification, while also generalizing to segmentation and detection, delivering about 2–3% gains on benchmarks with a parameter footprint under 1M. This approach demonstrates robust base-to-novel transfer and task-agnostic improvements, offering a practical, scalable solution for cross-task few-shot learning.

Abstract

Few-Shot Learning (FSL) alleviates the data shortage challenge via embedding discriminative target-aware features among plenty seen (base) and few unseen (novel) labeled samples. Most feature embedding modules in recent FSL methods are specially designed for corresponding learning tasks (e.g., classification, segmentation, and object detection), which limits the utility of embedding features. To this end, we propose a light and universal module named transformer-based Semantic Filter (tSF), which can be applied for different FSL tasks. The proposed tSF redesigns the inputs of a transformer-based structure by a semantic filter, which not only embeds the knowledge from whole base set to novel set but also filters semantic features for target category. Furthermore, the parameters of tSF is equal to half of a standard transformer block (less than 1M). In the experiments, our tSF is able to boost the performances in different classic few-shot learning tasks (about 2% improvement), especially outperforms the state-of-the-arts on multiple benchmark datasets in few-shot classification task.
Paper Structure (16 sections, 14 equations, 4 figures, 7 tables)

This paper contains 16 sections, 14 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: (a) Standard Transformer layer. (b) transformer-based Semantic Filter (tSF), where ${\theta}$ is learnable semantic filter. (c) Base to novel transferring by tSF. After training, the semantic info of base dataset are embedded into ${\theta}$, e.g. there are $n=5$ semantic groups and ${\theta}_1$ represents dog-like group. Then, given a novel input sample, tSF enhances its regions which are semantic similar to ${\theta}$. (d) Intuition for tSF enhancing novel input feature.
  • Figure 2: The tSF for few-shot learning tasks such as classification, semantic segmentation and object detection.
  • Figure 3: (a) The PatchProto framework inserted tSF for few-shot classification. (b) The t-SNE visualization comparison for PatchProto+tSF, where ${f^{'}=g_{\theta}(f)}$ and ${g_{\theta}}$ is the proposed tSF. (c) The visualizations of response map with the input of novel sample, where ${R\_{\theta_i}}$ is the correlation vector between ${(f,\theta_i)}$, and the dimension ${n}$ of ${\theta \in \mathbb{R}^{n\times c}}$ is set to 5.
  • Figure 4: The results on miniImageNet classification about the influence of dimension ${n}$ of ${\theta \in \mathbb{R}^{n\times c}}$ in tSF, which utilize PatchProto+tSF framework under ResNet-12 backbone without Rotation classifier.