Table of Contents
Fetching ...

Training-Free Dataset Pruning for Instance Segmentation

Yalun Dai, Lingao Xiao, Ivor W. Tsang, Yang He

TL;DR

This work proposes a novel Training-Free Dataset Pruning (TFDP) method for instance segmentation that leverages shape and class information from image annotations to design a Shape Complexity Score (SCS), refining it into a Scale-Invariant and Class-Balanced versions to address instance area variations and class imbalances.

Abstract

Existing dataset pruning techniques primarily focus on classification tasks, limiting their applicability to more complex and practical tasks like instance segmentation. Instance segmentation presents three key challenges: pixel-level annotations, instance area variations, and class imbalances, which significantly complicate dataset pruning efforts. Directly adapting existing classification-based pruning methods proves ineffective due to their reliance on time-consuming model training process. To address this, we propose a novel Training-Free Dataset Pruning (TFDP) method for instance segmentation. Specifically, we leverage shape and class information from image annotations to design a Shape Complexity Score (SCS), refining it into a Scale-Invariant (SI-SCS) and Class-Balanced (CB-SCS) versions to address instance area variations and class imbalances, all without requiring model training. We achieve state-of-the-art results on VOC 2012, Cityscapes, and COCO datasets, generalizing well across CNN and Transformer architectures. Remarkably, our approach accelerates the pruning process by an average of 1349$\times$ on COCO compared to the adapted baselines. Source code is available at: https://github.com/he-y/dataset-pruning-for-instance-segmentation

Training-Free Dataset Pruning for Instance Segmentation

TL;DR

This work proposes a novel Training-Free Dataset Pruning (TFDP) method for instance segmentation that leverages shape and class information from image annotations to design a Shape Complexity Score (SCS), refining it into a Scale-Invariant and Class-Balanced versions to address instance area variations and class imbalances.

Abstract

Existing dataset pruning techniques primarily focus on classification tasks, limiting their applicability to more complex and practical tasks like instance segmentation. Instance segmentation presents three key challenges: pixel-level annotations, instance area variations, and class imbalances, which significantly complicate dataset pruning efforts. Directly adapting existing classification-based pruning methods proves ineffective due to their reliance on time-consuming model training process. To address this, we propose a novel Training-Free Dataset Pruning (TFDP) method for instance segmentation. Specifically, we leverage shape and class information from image annotations to design a Shape Complexity Score (SCS), refining it into a Scale-Invariant (SI-SCS) and Class-Balanced (CB-SCS) versions to address instance area variations and class imbalances, all without requiring model training. We achieve state-of-the-art results on VOC 2012, Cityscapes, and COCO datasets, generalizing well across CNN and Transformer architectures. Remarkably, our approach accelerates the pruning process by an average of 1349 on COCO compared to the adapted baselines. Source code is available at: https://github.com/he-y/dataset-pruning-for-instance-segmentation

Paper Structure

This paper contains 33 sections, 14 equations, 9 figures, 13 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of $\text{AP}_{50}$ and runtime for sample ranking on COCO dataset: Our method vs. Entropy, EL2N, and CCS at 40% and 50% pruning rates. Our approach demonstrates superior efficiency and accuracy.
  • Figure 2: Visualization of VOC 2012 dataset to show variable instance area (a) and class imbalance (b).
  • Figure 3: Comparison of different dataset pruning pipelines. (a) Pruning classification dataset requires model training. (b) Our adaptation of previous methods on instance segmentation by training a segemntation head and computing the importance score for each pixel. (c) The proposed method that is training-free and model-independent.
  • Figure 4: Overview of our proposed framework. We introduce the Shape Complexity Score (SCS), in which we leverage the Perimeter-to-Area ratio to represent the boundary complexity. Following this, we apply scale normalization and intra-class normalization to address the inherent scale variability and class imbalance in instance segmentation tasks.
  • Figure 5: Experiments of high pruning rates (from 60% to 90%) on MS COCO dataset. Segmentation metrics include mAP, AP50 and the corresponding bounding-box version, mAPbb and AP50bb.
  • ...and 4 more figures