Table of Contents
Fetching ...

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang

TL;DR

DetDiffusion, for the first time, harmonizes generative and perceptive models, tackling the challenges in generating effective data for perceptive models, and introduces perception-aware loss through segmentation, improving both quality and controllability.

Abstract

Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonizes both, tackling the challenges in generating effective data for perceptive models. To enhance image generation with perceptive models, we introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. To boost the performance of specific perceptive models, our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation. Experimental results from the object detection task highlight DetDiffusion's superior performance, establishing a new state-of-the-art in layout-guided generation. Furthermore, image syntheses from DetDiffusion can effectively augment training data, significantly enhancing downstream detection performance.

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

TL;DR

DetDiffusion, for the first time, harmonizes generative and perceptive models, tackling the challenges in generating effective data for perceptive models, and introduces perception-aware loss through segmentation, improving both quality and controllability.

Abstract

Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonizes both, tackling the challenges in generating effective data for perceptive models. To enhance image generation with perceptive models, we introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. To boost the performance of specific perceptive models, our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation. Experimental results from the object detection task highlight DetDiffusion's superior performance, establishing a new state-of-the-art in layout-guided generation. Furthermore, image syntheses from DetDiffusion can effectively augment training data, significantly enhancing downstream detection performance.
Paper Structure (27 sections, 5 equations, 10 figures, 7 tables)

This paper contains 27 sections, 5 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Pipeline comparison between general L2I models (e.g., GeoDiffusion chen2023integrating) and our DetDiffusion (perception-aware L2I). Utilizing perception-aware loss (P.A. loss) and perception-aware attributes (P.A. Attr), DetDiffusion improves generation quality and controllability of L2I task. Perception-aware attributes further boost performance on downstream perceptive models. Moreover, perceptive model only added 1.3% of parameters.
  • Figure 2: Model architecture of DetDiffusion. To facilitate the synergy between generative models and perceptive models, we integrate two components into L2I training pipeline. Perception-aware loss (P.A. loss) leverages the segmentation head for better generation quality and controllability. Perception-aware attribute (P.A. Attr) enables DetDiffusion to generate highly useable data for training augmentation.
  • Figure 3: Three strategies for attribute application. Check the detailed definition in Sec. \ref{['sec:settings']}.
  • Figure 4: Qualitative comparison on the Microsoft COCO dataset. Our DetDiffusion can generate highly realistic images consistent with the provided semantic layouts.
  • Figure 5: Qualitative comparison on the perception-aware attribute. Although provided with the exact same semantic layouts, simply changing the perception-aware attribute (P.A. Attr) among $[easy]$ (left) and $[hard]$ (right) can effectively alter the low-level image pattern of generated images. The former achieves better detector recognizability, while the latter performs as better augmentation samples, as demonstrated in Table \ref{['tab:fidelity']} and \ref{['tab:trainability']} respectively.
  • ...and 5 more figures