Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

Ruoyu Chen; Hua Zhang; Jingzhi Li; Li Liu; Zhen Huang; Xiaochun Cao

Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

Ruoyu Chen, Hua Zhang, Jingzhi Li, Li Liu, Zhen Huang, Xiaochun Cao

TL;DR

This work addresses FSOD by introducing embedding side information to build a knowledge matrix that encodes semantic relations between base and novel categories. It integrates a Contextual Semantic Supervised Contrastive Learning (CCL) branch, a memory prototype bank, and a side-information guided counterfactual data augmentation to reduce feature-space bias and overfitting. The approach yields consistent improvements across multiple benchmarks (PASCAL VOC, MS COCO, LVIS V1, FSOD-1K, FSVOD-500) and backbones (ResNet and ViT), achieving state-of-the-art results in many settings. The combination of semantic-aware contrastive learning and interpretable augmentation offers a practical path to more robust FSOD in diverse, data-scarce scenarios.

Abstract

The objective of few-shot object detection (FSOD) is to detect novel objects with few training samples. The core challenge of this task is how to construct a generalized feature space for novel categories with limited data on the basis of the base category space, which could adapt the learned detection model to unknown scenarios. However, limited by insufficient samples for novel categories, two issues still exist: (1) the features of the novel category are easily implicitly represented by the features of the base category, leading to inseparable classifier boundaries, (2) novel categories with fewer data are not enough to fully represent the distribution, where the model fine-tuning is prone to overfitting. To address these issues, we introduce the side information to alleviate the negative influences derived from the feature space and sample viewpoints and formulate a novel generalized feature representation learning method for FSOD. Specifically, we first utilize embedding side information to construct a knowledge matrix to quantify the semantic relationship between the base and novel categories. Then, to strengthen the discrimination between semantically similar categories, we further develop contextual semantic supervised contrastive learning which embeds side information. Furthermore, to prevent overfitting problems caused by sparse samples, a side-information guided region-aware masked module is introduced to augment the diversity of samples, which finds and abandons biased information that discriminates between similar categories via counterfactual explanation, and refines the discriminative representation space further. Extensive experiments using ResNet and ViT backbones on PASCAL VOC, MS COCO, LVIS V1, FSOD-1K, and FSVOD-500 benchmarks demonstrate that our model outperforms the previous state-of-the-art methods, significantly improving the ability of FSOD in most shots/splits.

Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

TL;DR

Abstract

Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (8)