tSF: Transformer-based Semantic Filter for Few-Shot Learning

Jinxiang Lai; Siqian Yang; Wenlong Liu; Yi Zeng; Zhongyi Huang; Wenlong Wu; Jun Liu; Bin-Bin Gao; Chengjie Wang

tSF: Transformer-based Semantic Filter for Few-Shot Learning

Jinxiang Lai, Siqian Yang, Wenlong Liu, Yi Zeng, Zhongyi Huang, Wenlong Wu, Jun Liu, Bin-Bin Gao, Chengjie Wang

TL;DR

Few-shot learning suffers from weak representations for novel classes due to limited data. The authors propose a light transformer-based Semantic Filter (tSF) that redefines the Transformer inputs as $Q=f$ and $K,V=\theta$, enabling dataset-attention to encode base-set semantics into a learnable filter and transfer them to novel samples via $f' = FFN(f + \sigma(f \theta^T) \theta)$. tSF serves as a universal neck and is integrated with PatchProto for few-shot classification, while also generalizing to segmentation and detection, delivering about 2–3% gains on benchmarks with a parameter footprint under 1M. This approach demonstrates robust base-to-novel transfer and task-agnostic improvements, offering a practical, scalable solution for cross-task few-shot learning.

Abstract

Few-Shot Learning (FSL) alleviates the data shortage challenge via embedding discriminative target-aware features among plenty seen (base) and few unseen (novel) labeled samples. Most feature embedding modules in recent FSL methods are specially designed for corresponding learning tasks (e.g., classification, segmentation, and object detection), which limits the utility of embedding features. To this end, we propose a light and universal module named transformer-based Semantic Filter (tSF), which can be applied for different FSL tasks. The proposed tSF redesigns the inputs of a transformer-based structure by a semantic filter, which not only embeds the knowledge from whole base set to novel set but also filters semantic features for target category. Furthermore, the parameters of tSF is equal to half of a standard transformer block (less than 1M). In the experiments, our tSF is able to boost the performances in different classic few-shot learning tasks (about 2% improvement), especially outperforms the state-of-the-arts on multiple benchmark datasets in few-shot classification task.

tSF: Transformer-based Semantic Filter for Few-Shot Learning

TL;DR

Few-shot learning suffers from weak representations for novel classes due to limited data. The authors propose a light transformer-based Semantic Filter (tSF) that redefines the Transformer inputs as

and

, enabling dataset-attention to encode base-set semantics into a learnable filter and transfer them to novel samples via

. tSF serves as a universal neck and is integrated with PatchProto for few-shot classification, while also generalizing to segmentation and detection, delivering about 2–3% gains on benchmarks with a parameter footprint under 1M. This approach demonstrates robust base-to-novel transfer and task-agnostic improvements, offering a practical, scalable solution for cross-task few-shot learning.

Abstract

Paper Structure (16 sections, 14 equations, 4 figures, 7 tables)

This paper contains 16 sections, 14 equations, 4 figures, 7 tables.

Introduction
Related Work
Transformer-based Semantic Filter (tSF)
Related Transformer
tSF Methodology
Discussions
tSF for Few-Shot Classification
Problem Definition
PatchProto Framework with tSF
Objective functions
tSF for Few-Shot Segmentation and Detection
Experiments
Few-Shot Classification
Few-Shot Semantic Segmentation
Few-Shot Object Detection
...and 1 more sections

Figures (4)

Figure 1: (a) Standard Transformer layer. (b) transformer-based Semantic Filter (tSF), where ${\theta}$ is learnable semantic filter. (c) Base to novel transferring by tSF. After training, the semantic info of base dataset are embedded into ${\theta}$, e.g. there are $n=5$ semantic groups and ${\theta}_1$ represents dog-like group. Then, given a novel input sample, tSF enhances its regions which are semantic similar to ${\theta}$. (d) Intuition for tSF enhancing novel input feature.
Figure 2: The tSF for few-shot learning tasks such as classification, semantic segmentation and object detection.
Figure 3: (a) The PatchProto framework inserted tSF for few-shot classification. (b) The t-SNE visualization comparison for PatchProto+tSF, where ${f^{'}=g_{\theta}(f)}$ and ${g_{\theta}}$ is the proposed tSF. (c) The visualizations of response map with the input of novel sample, where ${R\_{\theta_i}}$ is the correlation vector between ${(f,\theta_i)}$, and the dimension ${n}$ of ${\theta \in \mathbb{R}^{n\times c}}$ is set to 5.
Figure 4: The results on miniImageNet classification about the influence of dimension ${n}$ of ${\theta \in \mathbb{R}^{n\times c}}$ in tSF, which utilize PatchProto+tSF framework under ResNet-12 backbone without Rotation classifier.

tSF: Transformer-based Semantic Filter for Few-Shot Learning

TL;DR

Abstract

tSF: Transformer-based Semantic Filter for Few-Shot Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)