Table of Contents
Fetching ...

Coreset Selection for Object Detection

Hojun Lee, Suyoung Kim, Junhoo Lee, Jaeyoung Yoo, Nojun Kwak

TL;DR

CSOD tackles the challenge of coreset selection for object detection, where images often contain multiple objects. It introduces imagewise-classwise vectors by averaging RoI features per class within each image and uses a greedy, class-wise rotating scheme guided by a submodular gain to balance representativeness and diversity. Across VOC, BDD100k, and COCO, CSOD consistently outperforms random selection and baselines, and its cross-architecture results show that subsets chosen with Faster R-CNN transfer effectively to RetinaNet and FCOS after tuning. While it has limitations such as not leveraging background features or inter-class interactions within images, CSOD offers a principled, scalable approach with clear practical implications for efficient dataset curation and potential extensions to dataset distillation in object detection.

Abstract

Coreset selection is a method for selecting a small, representative subset of an entire dataset. It has been primarily researched in image classification, assuming there is only one object per image. However, coreset selection for object detection is more challenging as an image can contain multiple objects. As a result, much research has yet to be done on this topic. Therefore, we introduce a new approach, Coreset Selection for Object Detection (CSOD). CSOD generates imagewise and classwise representative feature vectors for multiple objects of the same class within each image. Subsequently, we adopt submodular optimization for considering both representativeness and diversity and utilize the representative vectors in the submodular optimization process to select a subset. When we evaluated CSOD on the Pascal VOC dataset, CSOD outperformed random selection by +6.4%p in AP$_{50}$ when selecting 200 images.

Coreset Selection for Object Detection

TL;DR

CSOD tackles the challenge of coreset selection for object detection, where images often contain multiple objects. It introduces imagewise-classwise vectors by averaging RoI features per class within each image and uses a greedy, class-wise rotating scheme guided by a submodular gain to balance representativeness and diversity. Across VOC, BDD100k, and COCO, CSOD consistently outperforms random selection and baselines, and its cross-architecture results show that subsets chosen with Faster R-CNN transfer effectively to RetinaNet and FCOS after tuning. While it has limitations such as not leveraging background features or inter-class interactions within images, CSOD offers a principled, scalable approach with clear practical implications for efficient dataset curation and potential extensions to dataset distillation in object detection.

Abstract

Coreset selection is a method for selecting a small, representative subset of an entire dataset. It has been primarily researched in image classification, assuming there is only one object per image. However, coreset selection for object detection is more challenging as an image can contain multiple objects. As a result, much research has yet to be done on this topic. Therefore, we introduce a new approach, Coreset Selection for Object Detection (CSOD). CSOD generates imagewise and classwise representative feature vectors for multiple objects of the same class within each image. Subsequently, we adopt submodular optimization for considering both representativeness and diversity and utilize the representative vectors in the submodular optimization process to select a subset. When we evaluated CSOD on the Pascal VOC dataset, CSOD outperformed random selection by +6.4%p in AP when selecting 200 images.
Paper Structure (41 sections, 4 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 41 sections, 4 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: The difference in coreset selection between image classification and object detection.
  • Figure 2: The forward process during the training phase of Faster R-CNN. The RoI features include both foreground and background regions at the forward process.
  • Figure 3: Comparison with various selection methods. '$\#$' denotes the number of objects in the selected data.
  • Figure 4: Ratio of box sizes. We followed the size criteria provided by VOC. '$\#$' denotes the number of objects in the selected data.
  • Figure 5: Performance according to the selected image counts.
  • ...and 4 more figures