Semantically Consistent Person Image Generation
Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein
TL;DR
This work tackles scene-aware person image generation by inserting a target into a complex scene while preserving global context. It introduces a three-stage pipeline: a Pix2PixHD-based coarse semantic map estimator, a data-driven refinement that selects a near-match from a clustered semantic-map knowledge base, and an exemplar-driven, multi-scale attention renderer for appearance transfer. Key contributions include a clustering-based refinement to improve realism and diversity, a pose-conditioned rendering framework with robust perceptual and adversarial losses, and extensive ablations validating the importance of each stage. The approach enables realistic, controllable person insertion in cluttered scenes with demonstrated improvements over several baselines and rich qualitative results, offering practical utility for augmented reality and video synthesis applications.
Abstract
We propose a data-driven approach for context-aware person image generation. Specifically, we attempt to generate a person image such that the synthesized instance can blend into a complex scene. In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene. The proposed technique is divided into three sequential steps. At first, we employ a Pix2PixHD model to infer a coarse semantic mask that represents the new person's spatial location, scale, and potential pose. Next, we use a data-centric approach to select the closest representation from a precomputed cluster of fine semantic masks. Finally, we adopt a multi-scale, attention-guided architecture to transfer the appearance attributes from an exemplar image. The proposed strategy enables us to synthesize semantically coherent realistic persons that can blend into an existing scene without altering the global context. We conclude our findings with relevant qualitative and quantitative evaluations.
