Table of Contents
Fetching ...

InfinityGAN: Towards Infinite-Pixel Image Synthesis

Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang

TL;DR

InfinityGAN introduces a two-module, patch-based framework to synthesize infinite-pixel images from finite training data. A neural implicit structure synthesizer (G_S) and a padding-free texture synthesizer (G_T) enforce a shared global appearance while disentangling local structure and texture, enabling seamless assembly of arbitrarily large images. The approach achieves high realism, enables applications like spatial style fusion and multi-modal outpainting, and delivers significant inference-speedups via parallel patch processing. Evaluations on landscape data show favorable global coherence and competitive metrics, illustrating practical impact for high-resolution, arbitrary-field-of-view image synthesis.

Abstract

We present a novel framework, InfinityGAN, for arbitrary-sized image generation. The task is associated with several key challenges. First, scaling existing models to an arbitrarily large image size is resource-constrained, in terms of both computation and availability of large-field-of-view training data. InfinityGAN trains and infers in a seamless patch-by-patch manner with low computational resources. Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic. To address these, InfinityGAN disentangles global appearances, local structures, and textures. With this formulation, we can generate images with spatial size and level of details not attainable before. Experimental evaluation validates that InfinityGAN generates images with superior realism compared to baselines and features parallelizable inference. Finally, we show several applications unlocked by our approach, such as spatial style fusion, multi-modal outpainting, and image inbetweening. All applications can be operated with arbitrary input and output sizes. Please find the full version of the paper at https://openreview.net/forum?id=ufGMqIM0a4b .

InfinityGAN: Towards Infinite-Pixel Image Synthesis

TL;DR

InfinityGAN introduces a two-module, patch-based framework to synthesize infinite-pixel images from finite training data. A neural implicit structure synthesizer (G_S) and a padding-free texture synthesizer (G_T) enforce a shared global appearance while disentangling local structure and texture, enabling seamless assembly of arbitrarily large images. The approach achieves high realism, enables applications like spatial style fusion and multi-modal outpainting, and delivers significant inference-speedups via parallel patch processing. Evaluations on landscape data show favorable global coherence and competitive metrics, illustrating practical impact for high-resolution, arbitrary-field-of-view image synthesis.

Abstract

We present a novel framework, InfinityGAN, for arbitrary-sized image generation. The task is associated with several key challenges. First, scaling existing models to an arbitrarily large image size is resource-constrained, in terms of both computation and availability of large-field-of-view training data. InfinityGAN trains and infers in a seamless patch-by-patch manner with low computational resources. Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic. To address these, InfinityGAN disentangles global appearances, local structures, and textures. With this formulation, we can generate images with spatial size and level of details not attainable before. Experimental evaluation validates that InfinityGAN generates images with superior realism compared to baselines and features parallelizable inference. Finally, we show several applications unlocked by our approach, such as spatial style fusion, multi-modal outpainting, and image inbetweening. All applications can be operated with arbitrary input and output sizes. Please find the full version of the paper at https://openreview.net/forum?id=ufGMqIM0a4b .

Paper Structure

This paper contains 19 sections, 6 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Synthesizing infinite-pixel images from finite-sized training data. A 1024$\times$2048 image composed of 242 patches, independently synthesized by InfinityGAN with spatial fusion of two styles. The generator is trained on 101$\times$101 patches (e.g., marked in top-left) sampled from 197$\times$197 real images. Note that training and inference (of any size) are performed on a single GTX TITAN X GPU. Zoom-in for better experience.
  • Figure 2: Overview. The generator of InfinityGAN consists of two modules, a structure synthesizer based on a neural implicit function, and a fully-convolutional texture synthesizer with all positional information removed (see Figure \ref{['fig:fully_conv_generator']}). The two networks take four sets of inputs, a global latent variable that defines the holistic appearance of the image, a local latent variable that represents the local and structural variation, a continuous coordinate for learning the neural implicit structure synthesizer, and a set of randomized noises to model fine-grained texture. InfinityGAN synthesizes images of arbitrary size by learning spatially extensible representations.
  • Figure 3: Padding-free generator. (Left) Conventional generators synthesize inconsistent pixels due to the zero-paddings. Note that the inconsistency region grows exponentially as the network deepened. (Right) In contrast, our padding-free generator can synthesize consistent pixel value regardless of the position in the model receptive field. Such a property facilitates spatially-independently generating patches and forming into a seamless image with consistent feature values.
  • Figure 4: Qualitative comparison. We show that InfinityGAN can produce more favorable holistic appearances against related methods while testing with an extended size 1024$\times$1024. (NCI: Non-Constant Input, PFG: Padding-Free Generator). More results are shown in Appendix E.
  • Figure 5: LSUN bridge and tower. InfinityGAN synthesize at 512$\times$512 pixels. We provide more details and samples in Appendix H.
  • ...and 4 more figures