Table of Contents
Fetching ...

SuperNeRF-GAN: A Universal 3D-Consistent Super-Resolution Framework for Efficient and Enhanced 3D-Aware Image Synthesis

Peng Zheng, Linzhi Huang, Yizhou Yu, Yi Chang, Yilin Wang, Rui Ma

TL;DR

SuperNeRF-GAN addresses the challenge of achieving 3D-consistent high-resolution image synthesis with NeRF-based generators by introducing a NeRF Super-Resolution module and a boundary-aware, depth-guided rendering pipeline. The approach preserves 3D structure across viewpoints while significantly reducing sampling requirements via a boundary-correct multi-depth map and normal-guided depth super-resolution. It is designed as a universal framework that can be paired with pre-trained 3D generators (e.g., EG3D) to boost HR quality across portraits, animals, and full-body scenes, outperforming task-specific baselines in 3D-consistency and efficiency, with modest trade-offs in per-image quality. The work advances practical 3D-aware synthesis by delivering scalable, globally consistent HR outputs suitable for VR, games, and related applications.

Abstract

Neural volume rendering techniques, such as NeRF, have revolutionized 3D-aware image synthesis by enabling the generation of images of a single scene or object from various camera poses. However, the high computational cost of NeRF presents challenges for synthesizing high-resolution (HR) images. Most existing methods address this issue by leveraging 2D super-resolution, which compromise 3D-consistency. Other methods propose radiance manifolds or two-stage generation to achieve 3D-consistent HR synthesis, yet they are limited to specific synthesis tasks, reducing their universality. To tackle these challenges, we propose SuperNeRF-GAN, a universal framework for 3D-consistent super-resolution. A key highlight of SuperNeRF-GAN is its seamless integration with NeRF-based 3D-aware image synthesis methods and it can simultaneously enhance the resolution of generated images while preserving 3D-consistency and reducing computational cost. Specifically, given a pre-trained generator capable of producing a NeRF representation such as tri-plane, we first perform volume rendering to obtain a low-resolution image with corresponding depth and normal map. Then, we employ a NeRF Super-Resolution module which learns a network to obtain a high-resolution NeRF. Next, we propose a novel Depth-Guided Rendering process which contains three simple yet effective steps, including the construction of a boundary-correct multi-depth map through depth aggregation, a normal-guided depth super-resolution and a depth-guided NeRF rendering. Experimental results demonstrate the superior efficiency, 3D-consistency, and quality of our approach. Additionally, ablation studies confirm the effectiveness of our proposed components.

SuperNeRF-GAN: A Universal 3D-Consistent Super-Resolution Framework for Efficient and Enhanced 3D-Aware Image Synthesis

TL;DR

SuperNeRF-GAN addresses the challenge of achieving 3D-consistent high-resolution image synthesis with NeRF-based generators by introducing a NeRF Super-Resolution module and a boundary-aware, depth-guided rendering pipeline. The approach preserves 3D structure across viewpoints while significantly reducing sampling requirements via a boundary-correct multi-depth map and normal-guided depth super-resolution. It is designed as a universal framework that can be paired with pre-trained 3D generators (e.g., EG3D) to boost HR quality across portraits, animals, and full-body scenes, outperforming task-specific baselines in 3D-consistency and efficiency, with modest trade-offs in per-image quality. The work advances practical 3D-aware synthesis by delivering scalable, globally consistent HR outputs suitable for VR, games, and related applications.

Abstract

Neural volume rendering techniques, such as NeRF, have revolutionized 3D-aware image synthesis by enabling the generation of images of a single scene or object from various camera poses. However, the high computational cost of NeRF presents challenges for synthesizing high-resolution (HR) images. Most existing methods address this issue by leveraging 2D super-resolution, which compromise 3D-consistency. Other methods propose radiance manifolds or two-stage generation to achieve 3D-consistent HR synthesis, yet they are limited to specific synthesis tasks, reducing their universality. To tackle these challenges, we propose SuperNeRF-GAN, a universal framework for 3D-consistent super-resolution. A key highlight of SuperNeRF-GAN is its seamless integration with NeRF-based 3D-aware image synthesis methods and it can simultaneously enhance the resolution of generated images while preserving 3D-consistency and reducing computational cost. Specifically, given a pre-trained generator capable of producing a NeRF representation such as tri-plane, we first perform volume rendering to obtain a low-resolution image with corresponding depth and normal map. Then, we employ a NeRF Super-Resolution module which learns a network to obtain a high-resolution NeRF. Next, we propose a novel Depth-Guided Rendering process which contains three simple yet effective steps, including the construction of a boundary-correct multi-depth map through depth aggregation, a normal-guided depth super-resolution and a depth-guided NeRF rendering. Experimental results demonstrate the superior efficiency, 3D-consistency, and quality of our approach. Additionally, ablation studies confirm the effectiveness of our proposed components.
Paper Structure (40 sections, 10 equations, 9 figures, 8 tables)

This paper contains 40 sections, 10 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Effectiveness of our proposed SuperNeRF-GAN. The images in the first row are synthesized by existing pre-trained models, without the use of 2D image super-resolution. The second row shows the images super-resolved by SuperNeRF-GAN in a 3D-consistent way. Please zoom in to see the detailed differences between the original and super-resolved images.
  • Figure 2: Pipeline of the proposed SuperNeRF-GAN framework. Given a random noise $z$, the pre-trained generator of existing 3D generative models maps it to a low-resolution (LR) NeRF representation. This LR representation can be rendered into a corresponding LR image along with depth and normal maps. Next, the LR NeRF representation undergoes the NeRF Super-Resolution module to produce a high-resolution (HR) NeRF representation. Simultaneously, Dilation and Erosion for Depth Aggregation and Normal-Guided Depth Super-Resolution are applied to the LR depth map to construct a boundary-correct multi-depth map. This map guides the rendering process of the HR NeRF representation, enabling efficient and 3D-consistent HR image synthesis.
  • Figure 3: The left two figures demonstrate the effectiveness of Depth Aggregation (DA), note that the results are synthesized by untrained SuperNeRF-GAN models for better demonstration. The right two compare different DA techniques, highlighting that Neighbor-aware DA in SH-HD introduces noticeable artifacts, especially at depth discontinuities.
  • Figure 4: Effectiveness of Normal-Guided Depth Super-Resolution. The dashed rectangle highlights inaccuracies at depth discontinuity using bilinear interpolation, which result in artifacts as in the synthesized image.
  • Figure 5: Qualitative comparison among 3D-aware image synthesis methods. The results of other methods are taken from their respective papers to ensure a fair and consistent comparison. Since the 3D-inconsistency might not be evident in static images, we provide additional comparisons of 3D-consistency in our Supplementary Video, where StyleNeRF, StyleSDF, and EG3D show noticeable inconsistencies.
  • ...and 4 more figures