Table of Contents
Fetching ...

Multi-Perspective Data Augmentation for Few-shot Object Detection

Anh-Khoa Nguyen Vu, Quoc-Truong Truong, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Tam V. Nguyen

TL;DR

This work tackles FSOD data scarcity by introducing MPAD, a multi-perspective data augmentation framework that jointly models foreground-foreground and foreground-background relations. It combines In-Context Learning for Object Synthesis (ICOS) with a Harmonic Prompt Aggregation Scheduler (HPAS) and a Background Proposal method (BAP) to generate diverse, hard samples and contextually challenging backgrounds via controllable diffusion. Empirical results on PASCAL VOC and MS COCO show MPAD consistently improves over strong baselines and state-of-the-art methods, with substantial gains in $nAP50$ (e.g., +17.5% on VOC) and strong improvements in 1-shot settings. The approach demonstrates that richer, prompt-enhanced diffusion synthesis and targeted background sampling can meaningfully mitigate overfitting and enhance generalization in FSOD.

Abstract

Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the generation process in diffusion models, producing hard novel samples. For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds. Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of $17.5\%$ in nAP50 over the baseline on PASCAL VOC. Code is available at https://github.com/nvakhoa/MPAD.

Multi-Perspective Data Augmentation for Few-shot Object Detection

TL;DR

This work tackles FSOD data scarcity by introducing MPAD, a multi-perspective data augmentation framework that jointly models foreground-foreground and foreground-background relations. It combines In-Context Learning for Object Synthesis (ICOS) with a Harmonic Prompt Aggregation Scheduler (HPAS) and a Background Proposal method (BAP) to generate diverse, hard samples and contextually challenging backgrounds via controllable diffusion. Empirical results on PASCAL VOC and MS COCO show MPAD consistently improves over strong baselines and state-of-the-art methods, with substantial gains in (e.g., +17.5% on VOC) and strong improvements in 1-shot settings. The approach demonstrates that richer, prompt-enhanced diffusion synthesis and targeted background sampling can meaningfully mitigate overfitting and enhance generalization in FSOD.

Abstract

Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the generation process in diffusion models, producing hard novel samples. For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds. Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of in nAP50 over the baseline on PASCAL VOC. Code is available at https://github.com/nvakhoa/MPAD.

Paper Structure

This paper contains 21 sections, 6 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: T-SNE visualization of novel synthetic samples and base real samples in Novel Set 1 of PASCAL VOC. We only generate synthetic samples for three novel classes ("bird", "bus", "cow") and use real samples for three base classes ("aeroplane", "train", "horse"). Typical and hard samples in novel classes are created by using ICOS and HPAS, respectively. Base real samples are considered as typical samples.
  • Figure 2: The overall framework. To exploit the ability of controllable diffusion model for FSOD, we proposed a novel data augmentation method that incorporates various aspects to generate diverse data. Our method includes ICOS, BAP, HPAS. ICOS aims to deeply explore the attributes of novel classes and diversify the prompt for controllable diffusion models. BAP selects hard and typical backgrounds while HPAS generates hard (mixed) instances
  • Figure 3: In-context learning technique for exploring (a) attributes and (b) fine-grained object categories of a novel class given a sample. The input $\texttt{[CLASSNAME]}$ is replaced by class name $c \in C_{novel}$.
  • Figure 4: Visualization of the weighted values of the Harmonic Prompt Aggregation Scheduler across the timesteps of controllable diffusion.
  • Figure 5: Visualization of the mixed instances of the Harmonic Prompt Aggregation Scheduler during the generation data process in the controllable diffusion model.
  • ...and 5 more figures