Table of Contents
Fetching ...

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts

Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

TL;DR

This paper tackles automatic outdoor scene synthesis by conditioning image generation on semantic layouts and transient scene attributes. It introduces AL-CGAN, a conditional GAN where the generator uses layout and attribute conditioning and a Siamese discriminator fuses image and conditioning features, enabling precise object boundaries and diverse appearances. The model is trained on a fusion of ADE20K outdoor images and Transient Attributes data, demonstrating layout-controlled drawing of scene elements and attribute-driven appearance changes, including incremental scene editing. An ablation study shows that both layout and attribute conditioning improve realism, with future work aiming to extend to natural language conditioning.

Abstract

Automatic image synthesis research has been rapidly growing with deep networks getting more and more expressive. In the last couple of years, we have observed images of digits, indoor scenes, birds, chairs, etc. being automatically generated. The expressive power of image generators have also been enhanced by introducing several forms of conditioning variables such as object names, sentences, bounding box and key-point locations. In this work, we propose a novel deep conditional generative adversarial network architecture that takes its strength from the semantic layout and scene attributes integrated as conditioning variables. We show that our architecture is able to generate realistic outdoor scene images under different conditions, e.g. day-night, sunny-foggy, with clear object boundaries.

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts

TL;DR

This paper tackles automatic outdoor scene synthesis by conditioning image generation on semantic layouts and transient scene attributes. It introduces AL-CGAN, a conditional GAN where the generator uses layout and attribute conditioning and a Siamese discriminator fuses image and conditioning features, enabling precise object boundaries and diverse appearances. The model is trained on a fusion of ADE20K outdoor images and Transient Attributes data, demonstrating layout-controlled drawing of scene elements and attribute-driven appearance changes, including incremental scene editing. An ablation study shows that both layout and attribute conditioning improve realism, with future work aiming to extend to natural language conditioning.

Abstract

Automatic image synthesis research has been rapidly growing with deep networks getting more and more expressive. In the last couple of years, we have observed images of digits, indoor scenes, birds, chairs, etc. being automatically generated. The expressive power of image generators have also been enhanced by introducing several forms of conditioning variables such as object names, sentences, bounding box and key-point locations. In this work, we propose a novel deep conditional generative adversarial network architecture that takes its strength from the semantic layout and scene attributes integrated as conditioning variables. We show that our architecture is able to generate realistic outdoor scene images under different conditions, e.g. day-night, sunny-foggy, with clear object boundaries.

Paper Structure

This paper contains 11 sections, 2 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Our conditional generative adversarial network synthesizes realistic outdoor images from semantic layouts and transient scene attributes (Images generated automatically using a layout seen during training).
  • Figure 2: The architectures of the generator and discriminator networks in our AL-CGAN model.
  • Figure 3: Semantic layout conditioned outdoor scene generation using our AL-GAN. The input layouts are collected on images from SIFTflow LYT11 and LMSun TL13 datasets, hence they are previously unseen. The transient scene attributes are fixed to "clear sunny day" vector throughout the experiment.
  • Figure 4: AL-CGAN samples generated from the same semantic layout, e.g. given in the middle, by modulating the noise vector, i.e $z$. Rather than copying the previously seen scenes, our model is able to generate diverse samples.
  • Figure 5: Increasing night, sunset, cloud and rain attributes. AL-CGAN Model is trained with 9201 ADE20K images and fine tuned with images from Transient Attribute dataset (We provide more results in supplementary).
  • ...and 9 more figures