Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs
Viktor Seib, Malte Roosen, Ida Germann, Stefan Wirtz, Dietrich Paulus
TL;DR
This work presents a three-GAN augmentation pipeline that jointly generates semantic maps, inserts pedestrian instances, and translates maps to photo-realistic images to augment pedestrian-detection datasets. By chaining SemGAN, a contextual insertion model, and SPADE, the method leverages semantic-to-image transfer to create synthetic training data, demonstrating improved detection performance for near- and far-range pedestrians when combined with real data. Key contributions include detailing adaptations for handling discrete semantic maps, integrating instance insertion, and evaluating the impact on a YOLOv3 detector, with ablations highlighting the importance of retraining detection heads on synthetic data. The approach addresses data scarcity and domain shift, offering a practical route to expand urban-scene datasets, though it notes bottlenecks in the first step and calls for collecting task-specific training data for all pipeline stages.
Abstract
Creating annotated datasets demands a substantial amount of manual effort. In this proof-of-concept work, we address this issue by proposing a novel image generation pipeline. The pipeline consists of three distinct generative adversarial networks (previously published), combined in a novel way to augment a dataset for pedestrian detection. Despite the fact that the generated images are not always visually pleasant to the human eye, our detection benchmark reveals that the results substantially surpass the baseline. The presented proof-of-concept work was done in 2020 and is now published as a technical report after a three years retention period.
