Table of Contents
Fetching ...

GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds

Jianfeng Xiang, Jiaolong Yang, Yu Deng, Xin Tong

TL;DR

<3-5 sentence high-level summary>GRAM-HD introduces high-resolution, 3D-consistent image generation by performing 3D space super-resolution on GRAM's radiance manifolds with a shared 2D CNN. It employs manifold gridding to convert 3D surfaces into LR feature maps and upscales them to HR radiance maps while preserving volume rendering. A two-stage adversarial training regime with pose supervision, plus patch and cross-resolution consistency losses, enables robust high-resolution outputs. Empirical results on FFHQ and AFHQv2-CATS demonstrate strong 3D consistency and competitive image quality at 1024^2, with improved efficiency and enabling real-time free-view synthesis via cached HR representations.

Abstract

Recent works have shown that 3D-aware GANs trained on unstructured single image collections can generate multiview images of novel instances. The key underpinnings to achieve this are a 3D radiance field generator and a volume rendering process. However, existing methods either cannot generate high-resolution images (e.g., up to 256X256) due to the high computation cost of neural volume rendering, or rely on 2D CNNs for image-space upsampling which jeopardizes the 3D consistency across different views. This paper proposes a novel 3D-aware GAN that can generate high resolution images (up to 1024X1024) while keeping strict 3D consistency as in volume rendering. Our motivation is to achieve super-resolution directly in the 3D space to preserve 3D consistency. We avoid the otherwise prohibitively-expensive computation cost by applying 2D convolutions on a set of 2D radiance manifolds defined in the recent generative radiance manifold (GRAM) approach, and apply dedicated loss functions for effective GAN training at high resolution. Experiments on FFHQ and AFHQv2 datasets show that our method can produce high-quality 3D-consistent results that significantly outperform existing methods. It makes a significant step towards closing the gap between traditional 2D image generation and 3D-consistent free-view generation.

GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds

TL;DR

<3-5 sentence high-level summary>GRAM-HD introduces high-resolution, 3D-consistent image generation by performing 3D space super-resolution on GRAM's radiance manifolds with a shared 2D CNN. It employs manifold gridding to convert 3D surfaces into LR feature maps and upscales them to HR radiance maps while preserving volume rendering. A two-stage adversarial training regime with pose supervision, plus patch and cross-resolution consistency losses, enables robust high-resolution outputs. Empirical results on FFHQ and AFHQv2-CATS demonstrate strong 3D consistency and competitive image quality at 1024^2, with improved efficiency and enabling real-time free-view synthesis via cached HR representations.

Abstract

Recent works have shown that 3D-aware GANs trained on unstructured single image collections can generate multiview images of novel instances. The key underpinnings to achieve this are a 3D radiance field generator and a volume rendering process. However, existing methods either cannot generate high-resolution images (e.g., up to 256X256) due to the high computation cost of neural volume rendering, or rely on 2D CNNs for image-space upsampling which jeopardizes the 3D consistency across different views. This paper proposes a novel 3D-aware GAN that can generate high resolution images (up to 1024X1024) while keeping strict 3D consistency as in volume rendering. Our motivation is to achieve super-resolution directly in the 3D space to preserve 3D consistency. We avoid the otherwise prohibitively-expensive computation cost by applying 2D convolutions on a set of 2D radiance manifolds defined in the recent generative radiance manifold (GRAM) approach, and apply dedicated loss functions for effective GAN training at high resolution. Experiments on FFHQ and AFHQv2 datasets show that our method can produce high-quality 3D-consistent results that significantly outperform existing methods. It makes a significant step towards closing the gap between traditional 2D image generation and 3D-consistent free-view generation.
Paper Structure (54 sections, 15 equations, 17 figures, 5 tables)

This paper contains 54 sections, 15 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: The overall framework of our GRAM-HD method. The generator consists of two components: the radiance manifold generator and the manifold super-resolution module. The former generates radiance and feature manifolds that represent an LR 3D scene. Through manifold gridding, the manifolds are sampled to discrete 2D feature maps. The super-resolution module then processes these feature maps and output HR radiance maps. Finally, an HR image can rendered by computing ray-manifold intersections and integrating their radiance sampled from the HR radiance maps. Note that not like previous works utilizing 2D image super-resolution for HR image generation, we directly do 3D representation super-resolution and keep the volume rendering paradigm, thus keep the strong 3D consistency of the output images.
  • Figure 2: Left: Rendered images and radiance maps on the surface manifolds. Three sampled manifolds are shown here. The corresponding LR results before super-resolution are presented at bottom right. Right: Extracted proxy 3D shapes at HR (top row) and LR (bottom row). The rendered images are shown at bottom right for reference.
  • Figure 3: Qualitative comparison with recent 3D-aware GANs. The cat images of StyleNeRF are taken from their paper which is produced by a model trained on all images in AFHQv2; our training of StyleNeRF on cat images failed. (Best viewed with zoom-in)
  • Figure 4: Comparison of 3D consistency using spatiotemporal line textures akin to the Epipolar Line Images (EPI) bolles1987epipolar. We rotate the camera horizontally and stack the texture of a fixed horizontal line segment. Our method leads to a natural and smooth texture pattern, whereas others yield distorted and/or noisy patterns, indicating different degrees of 3D inconsistency. Detailed explanations can be found in the text. (See the accompanying video for results under continuous view change.)
  • Figure 5: Left: Sample results w/ and w/o the patch adversarial loss $\mathcal{L}_\mathrm{patch}$ on $1024^2$ resolution. $\mathcal{L}_\mathrm{patch}$ can effectively eliminate checkerboard artifacts. Right: Sample results w/ and w/o cross-resolution consistency loss $\mathcal{L}_\mathrm{cons}$. Some unwanted floaters appear in front of the faces without $\mathcal{L}_\mathrm{cons}$.
  • ...and 12 more figures