Table of Contents
Fetching ...

PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection

Yingjie Gao, Yanan Zhang, Ziyue Huang, Nanqing Liu, Di Huang

TL;DR

A Prototype-based Soft-labels (PS) strategy through assessing similarities between low-confidence pseudo-labels and category prototypes as soft-labels to unleash their potential, which substantially mitigates the constraints posed by few-shot samples.

Abstract

In recent years, Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress due to its ability to build models with a good generalization power using extremely limited annotated data. The fine-tuning based paradigm is currently dominating this field, where detectors are initially pre-trained on base classes with sufficient samples and then fine-tuned on novel ones with few samples, but the scarcity of labeled samples of novel classes greatly interferes precisely fitting their data distribution, thus hampering the performance. To address this issue, we propose a new framework for FSOD, namely Prototype-based Soft-labels and Test-Time Learning (PS-TTL). Specifically, we design a Test-Time Learning (TTL) module that employs a mean-teacher network for self-training to discover novel instances from test data, allowing detectors to learn better representations and classifiers for novel classes. Furthermore, we notice that even though relatively low-confidence pseudo-labels exhibit classification confusion, they still tend to recall foreground. We thus develop a Prototype-based Soft-labels (PS) strategy through assessing similarities between low-confidence pseudo-labels and category prototypes as soft-labels to unleash their potential, which substantially mitigates the constraints posed by few-shot samples. Extensive experiments on both the VOC and COCO benchmarks show that PS-TTL achieves the state-of-the-art, highlighting its effectiveness. The code and model are available at https://github.com/gaoyingjay/PS-TTL.

PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection

TL;DR

A Prototype-based Soft-labels (PS) strategy through assessing similarities between low-confidence pseudo-labels and category prototypes as soft-labels to unleash their potential, which substantially mitigates the constraints posed by few-shot samples.

Abstract

In recent years, Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress due to its ability to build models with a good generalization power using extremely limited annotated data. The fine-tuning based paradigm is currently dominating this field, where detectors are initially pre-trained on base classes with sufficient samples and then fine-tuned on novel ones with few samples, but the scarcity of labeled samples of novel classes greatly interferes precisely fitting their data distribution, thus hampering the performance. To address this issue, we propose a new framework for FSOD, namely Prototype-based Soft-labels and Test-Time Learning (PS-TTL). Specifically, we design a Test-Time Learning (TTL) module that employs a mean-teacher network for self-training to discover novel instances from test data, allowing detectors to learn better representations and classifiers for novel classes. Furthermore, we notice that even though relatively low-confidence pseudo-labels exhibit classification confusion, they still tend to recall foreground. We thus develop a Prototype-based Soft-labels (PS) strategy through assessing similarities between low-confidence pseudo-labels and category prototypes as soft-labels to unleash their potential, which substantially mitigates the constraints posed by few-shot samples. Extensive experiments on both the VOC and COCO benchmarks show that PS-TTL achieves the state-of-the-art, highlighting its effectiveness. The code and model are available at https://github.com/gaoyingjay/PS-TTL.
Paper Structure (28 sections, 10 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Motivation of FSOD with Test-Time Learning. (a) Hallucination methods suffer from the distribution gap between synthetic and real data. (b) Semi-supervised methods mine implicit novel instances from base data; however, novel instances do not always appear in base data. (c) For the first time, we propose to learn an enhanced model at test-time, effectively leveraging data of novel classes present in test data in a more realistic manner aligned with real-world applications.
  • Figure 2: The overview of the proposed Prototype-based Soft-labels and Test-Time Learning (PS-TTL) framework for FSOD. Both the student and teacher networks are first initialized by the few-shot detector and then fine-tuned on test data. The teacher network takes test data as input to generate pseudo-labels, while the student model is trained using these pseudo-labels after post-processing with $N$-way $K$-shot data as supervision signals and updates the teacher network through EMA. A Prototype-based Soft-labels (PS) strategy is adopted to maintain class prototypes and compute the feature similarity between low-confidence pseudo-labels and class prototypes to replace them with soft-labels.
  • Figure 3: Illustration of the issue of detection missing. In the left image, many pseudo-labels are filtered by $\delta_{upper}$ and most objects are not detected. In the right image, when $\delta_{lower}$ is applied, some relatively low-confidence pseudo-labels are retained as high-quality implicit foreground predictions.
  • Figure 4: Qualitative visualization comparison on PASCAL VOC. The top and bottom lines respectively show the results of DeFRCN and our PS-TTL.
  • Figure A: The performance trend of Test-Time Learning in terms of nAP50 (%) on PASCAL VOC.
  • ...and 1 more figures