Table of Contents
Fetching ...

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen

TL;DR

LinkGAN addresses the lack of explicit latent-to-pixel linkage in GANs by introducing a regularizer that partitions latent codes and image regions, enforcing that each latent subspace controls a corresponding image region. The method enables precise local edits for both 2D and 3D-aware generation, including fixed and semantic regions and multiple-region configurations, while remaining compatible with GAN inversion. Empirical results on FFHQ, AFHQ, LSUN-Church, and LSUN-Car show improved local controllability with only modest degradation in synthesis quality, and ablations indicate effective control with around 64 axes per linked region. This approach advances spatial controllability in GANs and opens pathways for real-image editing and region-aware synthesis without extensive architectural changes.

Abstract

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image. Establishing such a connection facilitates a more convenient local control of GAN generation, where users can alter the image content only within a spatial area simply by partially resampling the latent code. Experimental results confirm four appealing properties of our regularizer, which we call LinkGAN. (1) The latent-pixel linkage is applicable to either a fixed region (\textit{i.e.}, same for all instances) or a particular semantic category (i.e., varying across instances), like the sky. (2) Two or multiple regions can be independently linked to different latent axes, which further supports joint control. (3) Our regularizer can improve the spatial controllability of both 2D and 3D-aware GAN models, barely sacrificing the synthesis performance. (4) The models trained with our regularizer are compatible with GAN inversion techniques and maintain editability on real images.

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

TL;DR

LinkGAN addresses the lack of explicit latent-to-pixel linkage in GANs by introducing a regularizer that partitions latent codes and image regions, enforcing that each latent subspace controls a corresponding image region. The method enables precise local edits for both 2D and 3D-aware generation, including fixed and semantic regions and multiple-region configurations, while remaining compatible with GAN inversion. Empirical results on FFHQ, AFHQ, LSUN-Church, and LSUN-Car show improved local controllability with only modest degradation in synthesis quality, and ablations indicate effective control with around 64 axes per linked region. This approach advances spatial controllability in GANs and opens pathways for real-image editing and region-aware synthesis without extensive architectural changes.

Abstract

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image. Establishing such a connection facilitates a more convenient local control of GAN generation, where users can alter the image content only within a spatial area simply by partially resampling the latent code. Experimental results confirm four appealing properties of our regularizer, which we call LinkGAN. (1) The latent-pixel linkage is applicable to either a fixed region (\textit{i.e.}, same for all instances) or a particular semantic category (i.e., varying across instances), like the sky. (2) Two or multiple regions can be independently linked to different latent axes, which further supports joint control. (3) Our regularizer can improve the spatial controllability of both 2D and 3D-aware GAN models, barely sacrificing the synthesis performance. (4) The models trained with our regularizer are compatible with GAN inversion techniques and maintain editability on real images.
Paper Structure (17 sections, 5 equations, 19 figures, 6 tables)

This paper contains 17 sections, 5 equations, 19 figures, 6 tables.

Figures (19)

  • Figure 1: Precise local control achieved by LinkGAN, where we can manipulate the image content within a spatial region (e.g., a single eye or the right half of the image) or a semantic category (e.g., car) simply by resampling the latent code on some sparse axes. Our approach works well for 2D image syntheses, like StyleGAN2 stylegan2 (left three columns), and 3D-aware image synthesis, like EG3D Chan2022eg3d (right two columns). It is noteworthy that, under the 3D-aware case, we can control both the appearance and the underlying geometry.
  • Figure 2: Concept diagram of LinkGAN, where some axes of the latent space are explicitly linked to the image pixels of a spatial area. In this way, we can alter the image content within the linked region simply by resampling the latent code on these axes.
  • Figure 3: Linking latents to single fixed region, which is pre-selected before training and shared by all instances. Linked latent subspaces and regions are highlighted with red fragments and boxes, respectively, and the heatmaps reflect the change of pixel values after in-region resampling and out-region resampling. We find that LinkGAN can robustly link the latent to an arbitrary image region.
  • Figure 4: Linking latents to the semantic region (i.e., church and car), which dynamically varies across instances. Our LinkGAN manages to precisely control a particular semantic category simply by resampling on some sparse latent axes.
  • Figure 5: Linking latents to multiple regions, where the linked latent subspaces and image regions are highlighted using different colors. Each linked region can be independently controlled by partially resampling the corresponding latent code.
  • ...and 14 more figures