Table of Contents
Fetching ...

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen

TL;DR

DiverGen tackles the data-hungry nature of instance segmentation by examining how generative data shapes the real-world distribution and by explicitly boosting data diversity. It introduces Generative Data Diversity Enhancement (GDDE) across category, prompt, and generative-model dimensions, and implements a four-stage pipeline (instance generation, annotation, filtration, augmentation) to construct high-quality synthetic datasets without relying on uncontrolled web data. Key innovations include SAM-background for improved mask annotations and CLIP inter-similarity for more reliable data filtration, combined with multi-model diversity (Stable Diffusion and DeepFloyd-IF) and augmented prompts (manual + ChatGPT-generated). On LVIS, DiverGen significantly outperforms strong baselines like X-Paste, with notable gains for rare categories, demonstrating scalability to millions of synthetic examples while preserving performance gains. These results offer practical guidance for leveraging diversified generative data in large-scale segmentation pipelines.

Abstract

Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy. Most instance segmentation datasets today require costly manual annotation, limiting their data scale. Models trained on such data are prone to overfitting on the training set, especially for those rare categories. While recent works have delved into exploiting generative models to create synthetic datasets for data augmentation, these approaches do not efficiently harness the full potential of generative models. To address these issues, we introduce a more efficient strategy to construct generative datasets for data augmentation, termed DiverGen. Firstly, we provide an explanation of the role of generative data from the perspective of distribution discrepancy. We investigate the impact of different data on the distribution learned by the model. We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. Additionally, we find that the diversity of generative data is crucial for improving model performance and enhance it through various strategies, including category diversity, prompt diversity, and generative model diversity. With these strategies, we can scale the data to millions while maintaining the trend of model performance improvement. On the LVIS dataset, DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

TL;DR

DiverGen tackles the data-hungry nature of instance segmentation by examining how generative data shapes the real-world distribution and by explicitly boosting data diversity. It introduces Generative Data Diversity Enhancement (GDDE) across category, prompt, and generative-model dimensions, and implements a four-stage pipeline (instance generation, annotation, filtration, augmentation) to construct high-quality synthetic datasets without relying on uncontrolled web data. Key innovations include SAM-background for improved mask annotations and CLIP inter-similarity for more reliable data filtration, combined with multi-model diversity (Stable Diffusion and DeepFloyd-IF) and augmented prompts (manual + ChatGPT-generated). On LVIS, DiverGen significantly outperforms strong baselines like X-Paste, with notable gains for rare categories, demonstrating scalability to millions of synthetic examples while preserving performance gains. These results offer practical guidance for leveraging diversified generative data in large-scale segmentation pipelines.

Abstract

Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy. Most instance segmentation datasets today require costly manual annotation, limiting their data scale. Models trained on such data are prone to overfitting on the training set, especially for those rare categories. While recent works have delved into exploiting generative models to create synthetic datasets for data augmentation, these approaches do not efficiently harness the full potential of generative models. To address these issues, we introduce a more efficient strategy to construct generative datasets for data augmentation, termed DiverGen. Firstly, we provide an explanation of the role of generative data from the perspective of distribution discrepancy. We investigate the impact of different data on the distribution learned by the model. We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. Additionally, we find that the diversity of generative data is crucial for improving model performance and enhance it through various strategies, including category diversity, prompt diversity, and generative model diversity. With these strategies, we can scale the data to millions while maintaining the trend of model performance improvement. On the LVIS dataset, DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.
Paper Structure (24 sections, 2 equations, 14 figures, 12 tables)

This paper contains 24 sections, 2 equations, 14 figures, 12 tables.

Figures (14)

  • Figure 1: Visualization of data distributions on different sources. Compared to real-world data (LVIS train and LVIS val), generative data (Stable Diffusion and IF) can expand the data distribution that the model can learn.
  • Figure 2: Examples of various generative models. The samples generated by different generative models vary, even within the same category.
  • Figure 3: Overview of the DiverGen pipeline. In instance generation, we enhance data diversity at three levels: category diversity, prompt diversity, and generative model diversity. Next, we use SAM-background to obtain high-quality masks. Then, we use CLIP inter-similarity to filter out low-quality data. At last, we use the instance paste strategy to increase model learning efficiency on generative data.
  • Figure 4: Examples of generative data using different prompts. By using prompts designed by ChatGPT, the diversity of generated images in terms of shapes, textures, etc. can be significantly improved.
  • Figure 5: Examples of object mask of different annotation strategies. SAM-bg can obtain more complete and delicate masks.
  • ...and 9 more figures