Table of Contents
Fetching ...

Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs

Viktor Seib, Malte Roosen, Ida Germann, Stefan Wirtz, Dietrich Paulus

TL;DR

This work presents a three-GAN augmentation pipeline that jointly generates semantic maps, inserts pedestrian instances, and translates maps to photo-realistic images to augment pedestrian-detection datasets. By chaining SemGAN, a contextual insertion model, and SPADE, the method leverages semantic-to-image transfer to create synthetic training data, demonstrating improved detection performance for near- and far-range pedestrians when combined with real data. Key contributions include detailing adaptations for handling discrete semantic maps, integrating instance insertion, and evaluating the impact on a YOLOv3 detector, with ablations highlighting the importance of retraining detection heads on synthetic data. The approach addresses data scarcity and domain shift, offering a practical route to expand urban-scene datasets, though it notes bottlenecks in the first step and calls for collecting task-specific training data for all pipeline stages.

Abstract

Creating annotated datasets demands a substantial amount of manual effort. In this proof-of-concept work, we address this issue by proposing a novel image generation pipeline. The pipeline consists of three distinct generative adversarial networks (previously published), combined in a novel way to augment a dataset for pedestrian detection. Despite the fact that the generated images are not always visually pleasant to the human eye, our detection benchmark reveals that the results substantially surpass the baseline. The presented proof-of-concept work was done in 2020 and is now published as a technical report after a three years retention period.

Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs

TL;DR

This work presents a three-GAN augmentation pipeline that jointly generates semantic maps, inserts pedestrian instances, and translates maps to photo-realistic images to augment pedestrian-detection datasets. By chaining SemGAN, a contextual insertion model, and SPADE, the method leverages semantic-to-image transfer to create synthetic training data, demonstrating improved detection performance for near- and far-range pedestrians when combined with real data. Key contributions include detailing adaptations for handling discrete semantic maps, integrating instance insertion, and evaluating the impact on a YOLOv3 detector, with ablations highlighting the importance of retraining detection heads on synthetic data. The approach addresses data scarcity and domain shift, offering a practical route to expand urban-scene datasets, though it notes bottlenecks in the first step and calls for collecting task-specific training data for all pipeline stages.

Abstract

Creating annotated datasets demands a substantial amount of manual effort. In this proof-of-concept work, we address this issue by proposing a novel image generation pipeline. The pipeline consists of three distinct generative adversarial networks (previously published), combined in a novel way to augment a dataset for pedestrian detection. Despite the fact that the generated images are not always visually pleasant to the human eye, our detection benchmark reveals that the results substantially surpass the baseline. The presented proof-of-concept work was done in 2020 and is now published as a technical report after a three years retention period.
Paper Structure (12 sections, 8 figures, 3 tables)

This paper contains 12 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The augmentation pipeline consisting of three GANs that generate semantic maps and images from a latent variable $\mathbf{z}$. In the first step we use SemGAN ghelfi2019adversarial to generate a semantic map. In the second step, the work proposed by Lee et al. lee2018context is used to insert a new object instance (person) into the semantic map. Finally, SPADE park2019semantic is used to convert the semantic map into an RGB image. Images used for illustration purposes, they are not the actual output.
  • Figure 2: Our implementation of the SemGAN generator with the described modifications.
  • Figure 3: Our implementation of the SemGAN discriminator with the described modifications.
  • Figure 4: Semantic maps generated by the SemGAN network (output of the first pipeline step).
  • Figure 5: Semantic maps with inserted person instances (output of the second pipeline step). Inserted instances are highlighted by a white border for visualization purposes.
  • ...and 3 more figures