Table of Contents
Fetching ...

Adding New Categories in Object Detection Using Few-Shot Copy-Paste

Boyang Deng, Meiyan Lin, Shoulun Long

TL;DR

The paper tackles data efficiency in object detection by focusing on realistic occlusion patterns to efficiently add new categories. It introduces a copy-paste–based augmentation pipeline guided by an inferred occlusion distribution, and enhances data collection with occlusion-aware strategies, including Monte Carlo sampling, camera viewpoints, and FAIRMOT-based annotation. Across FVSS and COCO benchmarks, the approach demonstrates that tens of labeled images can match or approach the performance of thousands when augmentations reproduce plausible occlusion relationships, achieving high unseen-test accuracy for new categories. The work offers a practical, scalable path to expanding detection systems with minimal labeling effort, reducing training costs and enabling deployment in real-world, occlusion-rich environments.

Abstract

Developing data-efficient instance detection models that can handle rare object categories remains a key challenge in computer vision. However, existing research often overlooks data collection strategies and evaluation metrics tailored to real-world scenarios involving neural networks. In this study, we systematically investigate data collection and augmentation techniques focused on object occlusion, aiming to mimic occlusion relationships observed in practical applications. Surprisingly, we find that even a simple occlusion mechanism is sufficient to achieve strong performance when introducing new object categories. Notably, by adding just 15 images of a new category to a large-scale training dataset containing over half a million images across hundreds of categories, the model achieves 95\% accuracy on an unseen test set with thousands of instances of the new category.

Adding New Categories in Object Detection Using Few-Shot Copy-Paste

TL;DR

The paper tackles data efficiency in object detection by focusing on realistic occlusion patterns to efficiently add new categories. It introduces a copy-paste–based augmentation pipeline guided by an inferred occlusion distribution, and enhances data collection with occlusion-aware strategies, including Monte Carlo sampling, camera viewpoints, and FAIRMOT-based annotation. Across FVSS and COCO benchmarks, the approach demonstrates that tens of labeled images can match or approach the performance of thousands when augmentations reproduce plausible occlusion relationships, achieving high unseen-test accuracy for new categories. The work offers a practical, scalable path to expanding detection systems with minimal labeling effort, reducing training costs and enabling deployment in real-world, occlusion-rich environments.

Abstract

Developing data-efficient instance detection models that can handle rare object categories remains a key challenge in computer vision. However, existing research often overlooks data collection strategies and evaluation metrics tailored to real-world scenarios involving neural networks. In this study, we systematically investigate data collection and augmentation techniques focused on object occlusion, aiming to mimic occlusion relationships observed in practical applications. Surprisingly, we find that even a simple occlusion mechanism is sufficient to achieve strong performance when introducing new object categories. Notably, by adding just 15 images of a new category to a large-scale training dataset containing over half a million images across hundreds of categories, the model achieves 95\% accuracy on an unseen test set with thousands of instances of the new category.
Paper Structure (11 sections, 11 figures, 5 tables)

This paper contains 11 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Example of a beverage-only arrangement.
  • Figure 2: Example of a beverage and snack arrangement.
  • Figure 3: Detection results for small box-shaped drinks.
  • Figure 4: Examples of low-height snacks.
  • Figure 5: Heatmaps of two categories, "guangshiboluopi" (left), and "yangzhiganlu" (right).
  • ...and 6 more figures