Table of Contents
Fetching ...

Object-Centric Data Synthesis for Category-level Object Detection

Vikhyat Agarwal, Jiayi Cora Guo, Declan Hoban, Sissi Zhang, Nicholas Moran, Peter Cho, Srilakshmi Pattabiraman, Shantanu Joshi

TL;DR

The paper tackles the challenge of data scarcity in category-level object detection by proposing an object-centric data synthesis setting that leverages isolated object captures (2D masks or 3D models). It systematically compares four data augmentation methods—Cut-Paste, Diffusion Copy-Paste, 3D Random Placement, and 3D Copy-Paste—across a constrained evaluation using the PACE dataset, highlighting the benefits of context-aware and diffusion-informed synthesis. Key findings show that Diffusion Copy-Paste and 3D Random Placement yield robust improvements in low-data regimes and that combining multiple synthetic methods can further boost performance, with occlusion-aware techniques providing additional gains. The work demonstrates practical value for deploying detectors on long-tail or industrial object categories where large labeled datasets are unavailable, and points to future work on richer occlusion modeling and broader long-tail coverage.

Abstract

Deep learning approaches to object detection have achieved reliable detection of specific object classes in images. However, extending a model's detection capability to new object classes requires large amounts of annotated training data, which is costly and time-consuming to acquire, especially for long-tailed classes with insufficient representation in existing datasets. Here, we introduce the object-centric data setting, when limited data is available in the form of object-centric data (multi-view images or 3D models), and systematically evaluate the performance of four different data synthesis methods to finetune object detection models on novel object categories in this setting. The approaches are based on simple image processing techniques, 3D rendering, and image diffusion models, and use object-centric data to synthesize realistic, cluttered images with varying contextual coherence and complexity. We assess how these methods enable models to achieve category-level generalization in real-world data, and demonstrate significant performance boosts within this data-constrained experimental setting.

Object-Centric Data Synthesis for Category-level Object Detection

TL;DR

The paper tackles the challenge of data scarcity in category-level object detection by proposing an object-centric data synthesis setting that leverages isolated object captures (2D masks or 3D models). It systematically compares four data augmentation methods—Cut-Paste, Diffusion Copy-Paste, 3D Random Placement, and 3D Copy-Paste—across a constrained evaluation using the PACE dataset, highlighting the benefits of context-aware and diffusion-informed synthesis. Key findings show that Diffusion Copy-Paste and 3D Random Placement yield robust improvements in low-data regimes and that combining multiple synthetic methods can further boost performance, with occlusion-aware techniques providing additional gains. The work demonstrates practical value for deploying detectors on long-tail or industrial object categories where large labeled datasets are unavailable, and points to future work on richer occlusion modeling and broader long-tail coverage.

Abstract

Deep learning approaches to object detection have achieved reliable detection of specific object classes in images. However, extending a model's detection capability to new object classes requires large amounts of annotated training data, which is costly and time-consuming to acquire, especially for long-tailed classes with insufficient representation in existing datasets. Here, we introduce the object-centric data setting, when limited data is available in the form of object-centric data (multi-view images or 3D models), and systematically evaluate the performance of four different data synthesis methods to finetune object detection models on novel object categories in this setting. The approaches are based on simple image processing techniques, 3D rendering, and image diffusion models, and use object-centric data to synthesize realistic, cluttered images with varying contextual coherence and complexity. We assess how these methods enable models to achieve category-level generalization in real-world data, and demonstrate significant performance boosts within this data-constrained experimental setting.

Paper Structure

This paper contains 19 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Examples from the Cut-Paste implementation of CutNPaste-Dwibedi2017.
  • Figure 2: Some examples of the Diffusion Copy-Paste image generation technique.
  • Figure 3: Pipeline for synthetic data generation for Diffusion Copy-Paste.
  • Figure 4: From HDRI background to synthetic data.
  • Figure 5: A single can viewed from multiple angles.
  • ...and 5 more figures