Multi-Perspective Data Augmentation for Few-shot Object Detection
Anh-Khoa Nguyen Vu, Quoc-Truong Truong, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Tam V. Nguyen
TL;DR
This work tackles FSOD data scarcity by introducing MPAD, a multi-perspective data augmentation framework that jointly models foreground-foreground and foreground-background relations. It combines In-Context Learning for Object Synthesis (ICOS) with a Harmonic Prompt Aggregation Scheduler (HPAS) and a Background Proposal method (BAP) to generate diverse, hard samples and contextually challenging backgrounds via controllable diffusion. Empirical results on PASCAL VOC and MS COCO show MPAD consistently improves over strong baselines and state-of-the-art methods, with substantial gains in $nAP50$ (e.g., +17.5% on VOC) and strong improvements in 1-shot settings. The approach demonstrates that richer, prompt-enhanced diffusion synthesis and targeted background sampling can meaningfully mitigate overfitting and enhance generalization in FSOD.
Abstract
Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the generation process in diffusion models, producing hard novel samples. For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds. Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of $17.5\%$ in nAP50 over the baseline on PASCAL VOC. Code is available at https://github.com/nvakhoa/MPAD.
