SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu
TL;DR
SegGen tackles the critical bottleneck of limited segmentation data by reversing the traditional data-generation pipeline: it first learns to synthesize segmentation masks from text (Text2Mask) and then generates images conditioned on those masks (Mask2Img). Two complementary strategies, MaskSyn (diverse masks and images) and ImgSyn (diverse images for real masks), enable large-scale, high-quality synthetic data without segmentation-labeler modules. Across ADE20K and COCO, SegGen yields consistent gains on semantic, panoptic, and instance segmentation and improves robustness to unseen domains, including when trained purely on synthetic data. These results demonstrate that high-quality synthetic data can approach real-data performance, reducing annotation costs and enabling better generalization in real-world scenarios.
Abstract
We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strategies: MaskSyn and ImgSyn. (i) MaskSyn synthesizes new mask-image pairs via our proposed text-to-mask generation model and mask-to-image generation model, greatly improving the diversity in segmentation masks for model supervision; (ii) ImgSyn synthesizes new images based on existing masks using the mask-to-image generation model, strongly improving image diversity for model inputs. On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation. Notably, in terms of the ADE20K mIoU, Mask2Former R50 is largely boosted from 47.2 to 49.9 (+2.7); Mask2Former Swin-L is also significantly increased from 56.1 to 57.4 (+1.3). These promising results strongly suggest the effectiveness of our SegGen even when abundant human-annotated training data is utilized. Moreover, training with our synthetic data makes the segmentation models more robust towards unseen domains. Project website: https://seggenerator.github.io
