Table of Contents
Fetching ...

Content-Aware Preserving Image Generation

Giang H. Le, Anh Q. Nguyen, Byeongkeun Kang, Yeejin Lee

TL;DR

To validate the effectiveness of the proposed framework in preserving content attributes, extensive experiments are conducted on widely used benchmark datasets, including Flickr-Faces-High Quality, Animal Faces High Quality, and Large-scale Scene Understanding datasets.

Abstract

Remarkable progress has been achieved in image generation with the introduction of generative models. However, precisely controlling the content in generated images remains a challenging task due to their fundamental training objective. This paper addresses this challenge by proposing a novel image generation framework explicitly designed to incorporate desired content in output images. The framework utilizes advanced encoding techniques, integrating subnetworks called content fusion and frequency encoding modules. The frequency encoding module first captures features and structures of reference images by exclusively focusing on selected frequency components. Subsequently, the content fusion module generates a content-guiding vector that encapsulates desired content features. During the image generation process, content-guiding vectors from real images are fused with projected noise vectors. This ensures the production of generated images that not only maintain consistent content from guiding images but also exhibit diverse stylistic variations. To validate the effectiveness of the proposed framework in preserving content attributes, extensive experiments are conducted on widely used benchmark datasets, including Flickr-Faces-High Quality, Animal Faces High Quality, and Large-scale Scene Understanding datasets.

Content-Aware Preserving Image Generation

TL;DR

To validate the effectiveness of the proposed framework in preserving content attributes, extensive experiments are conducted on widely used benchmark datasets, including Flickr-Faces-High Quality, Animal Faces High Quality, and Large-scale Scene Understanding datasets.

Abstract

Remarkable progress has been achieved in image generation with the introduction of generative models. However, precisely controlling the content in generated images remains a challenging task due to their fundamental training objective. This paper addresses this challenge by proposing a novel image generation framework explicitly designed to incorporate desired content in output images. The framework utilizes advanced encoding techniques, integrating subnetworks called content fusion and frequency encoding modules. The frequency encoding module first captures features and structures of reference images by exclusively focusing on selected frequency components. Subsequently, the content fusion module generates a content-guiding vector that encapsulates desired content features. During the image generation process, content-guiding vectors from real images are fused with projected noise vectors. This ensures the production of generated images that not only maintain consistent content from guiding images but also exhibit diverse stylistic variations. To validate the effectiveness of the proposed framework in preserving content attributes, extensive experiments are conducted on widely used benchmark datasets, including Flickr-Faces-High Quality, Animal Faces High Quality, and Large-scale Scene Understanding datasets.

Paper Structure

This paper contains 20 sections, 8 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Comparison of (a) typical image generation with (b) the proposed content-preserving image generation. Typical GANs generate random images following the distribution of real images. In contrast, the proposed framework allows users to exert control over image generation, enabling them to specify desired content attributes in the generated images.
  • Figure 2: Description of main components of the proposed framework during training. (a) Overall structure. (b) Composition of component manipulation block (CMB). (c) Composition of projecting block (PB). During the training process, the frequency encoding module generates a content-guiding vector from a frequency-analyzed image. This vector is designed to closely align with the feature embedding of the reference image $\mathbf{x}$. This alignment enables effective control over the content of the generated images during the inference stage. Note that BatchNorm in CMB refers to batch normalization batchnorm, GAP in PB stands for global average pooling, FC denotes a fully connected layer, and FT represents feature transformation.
  • Figure 3: Inference phase of the proposed framework. The trained frequency encoding module takes a real image as an input, extracts content features from it, and transfers these features to the generator.
  • Figure 4: Examples of different composition factors according to different frequency bands. (a) Magnitude of DFT. DC is in the middle. (b) Low-pass ($Y_{ L}$) filtered DFT with $b_{ L} = 30$ of (a). (c) High-pass ($Y_{ H}$) filtered DFT with $b_{ H} = 30$ of (a). (d) Inverse DFT of size $128 \times 128$ of (a). (e) Inverse DFT of (b). The overall layout of the original image is preserved, while the boundary regions are smoothed. (f) Inverse DFT of (c). This filter highlights the edges in the image while suppressing the homogeneous components.
  • Figure 5: Example of (a) generated images by the proposed framework trained on FFHQ stylegans and (b) their corresponding attributes determined by the classifiers described in Section \ref{['sec:content_results']}. In (a), the images in the first column are real input images, and the images in the first row are the images generated from a projected vector $\mathbf{w}_{ 2}$. These examples demonstrate the preservation of real image content in the row-wise direction and control over color distribution and background in the column-wise direction for generated images. The table in (b) displays examples of the classified attributes. Each cell in the table contains the abbreviations of attributes of the images corresponding to the positions in (a).
  • ...and 7 more figures