Table of Contents
Fetching ...

HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN

Adam Kania, Artur Kasymov, Jakub Kościukiewicz, Artur Górak, Marcin Mazur, Maciej Zięba, Przemysław Spurek

TL;DR

HyperNeRFGAN introduces a hypernetwork that maps a latent Gaussian vector to the weights of a simplified NeRF $F_{ heta}$ that does not rely on viewing directions, enabling 3D-aware image synthesis trained with a 2D discriminator. The model leverages factorized multiplicative modulation within a NeRF backbone and a reduced coarse network, trained under a StyleGAN2-like objective to render diverse 2D views from 3D objects. Across ShapeNet, CARLA, CelebA, and medical DRR datasets, HyperNeRFGAN achieves competitive or superior performance relative to state-of-the-art 3D-aware methods, with particular strength when camera pose data is unavailable or scarce. The approach offers practical impact for 3D generation in domains such as medical imaging, where acquiring pose information is challenging, by delivering 3D-consistent renderings from unlabeled 2D views.

Abstract

The recent surge in popularity of deep generative models for 3D objects has highlighted the need for more efficient training methods, particularly given the difficulties associated with training with conventional 3D representations, such as voxels or point clouds. Neural Radiance Fields (NeRFs), which provide the current benchmark in terms of quality for the generation of novel views of complex 3D scenes from a limited set of 2D images, represent a promising solution to this challenge. However, the training of these models requires the knowledge of the respective camera positions from which the images were viewed. In this paper, we overcome this limitation by introducing HyperNeRFGAN, a Generative Adversarial Network (GAN) architecture employing a hypernetwork paradigm to transform a Gaussian noise into the weights of a NeRF architecture that does not utilize viewing directions in its training phase. Consequently, as evidenced by the findings of our experimental study, the proposed model, despite its notable simplicity in comparison to existing state-of-the-art alternatives, demonstrates superior performance on a diverse range of image datasets where camera position estimation is challenging, particularly in the context of medical data.

HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN

TL;DR

HyperNeRFGAN introduces a hypernetwork that maps a latent Gaussian vector to the weights of a simplified NeRF that does not rely on viewing directions, enabling 3D-aware image synthesis trained with a 2D discriminator. The model leverages factorized multiplicative modulation within a NeRF backbone and a reduced coarse network, trained under a StyleGAN2-like objective to render diverse 2D views from 3D objects. Across ShapeNet, CARLA, CelebA, and medical DRR datasets, HyperNeRFGAN achieves competitive or superior performance relative to state-of-the-art 3D-aware methods, with particular strength when camera pose data is unavailable or scarce. The approach offers practical impact for 3D generation in domains such as medical imaging, where acquiring pose information is challenging, by delivering 3D-consistent renderings from unlabeled 2D views.

Abstract

The recent surge in popularity of deep generative models for 3D objects has highlighted the need for more efficient training methods, particularly given the difficulties associated with training with conventional 3D representations, such as voxels or point clouds. Neural Radiance Fields (NeRFs), which provide the current benchmark in terms of quality for the generation of novel views of complex 3D scenes from a limited set of 2D images, represent a promising solution to this challenge. However, the training of these models requires the knowledge of the respective camera positions from which the images were viewed. In this paper, we overcome this limitation by introducing HyperNeRFGAN, a Generative Adversarial Network (GAN) architecture employing a hypernetwork paradigm to transform a Gaussian noise into the weights of a NeRF architecture that does not utilize viewing directions in its training phase. Consequently, as evidenced by the findings of our experimental study, the proposed model, despite its notable simplicity in comparison to existing state-of-the-art alternatives, demonstrates superior performance on a diverse range of image datasets where camera position estimation is challenging, particularly in the context of medical data.
Paper Structure (13 sections, 5 equations, 11 figures, 4 tables)

This paper contains 13 sections, 5 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Our HyperNeRFGAN model employs a hypernetwork to convert a Gaussian noise into the weights of the simplified NeRF architecture (not requiring information about camera positions), subsequently used for the generation of novel 2D views. During the training phase, a standard GAN-based framework (incorporating a typical 2D discriminator) is employed. Despite the generation of 2D images, our model utilizes a 3D-aware NeRF representation, thereby facilitating precise 3D object generation.
  • Figure 2: Qualitative comparison of HyperNeRFGAN (our) with HoloGAN nguyen2019hologan, GRAF schwarz2020graf, and $\pi$-GAN chan2021pi trained on the CARLA dataset dosovitskiy2017carla. It is noteworthy that our model has been shown to produce outcomes that are comparable to those of the most successful competitor, namely $\pi$-GAN.
  • Figure 3: Sample 2D images generated by the HyperNeRFGAN model (our) trained on the ShapeNet-based dataset proposed in zimny2022points2nerf, consisting of 50 images of each object from the car, plane, and chair classes.
  • Figure 4: Sample 2D images generated by the HyperNeRFGAN model (our) trained on the CARLA dataset dosovitskiy2017carla. It should be noted that our method permits the effective modeling of transparency in car windows.
  • Figure 5: Qualitative comparison between HyperNeRFGAN (our) and MedNeRF trained on the medical dataset consisting of digitally reconstructed radiographs (DRR) of knees and chests coronafigueroa2022mednerf. Note that our qualitative comparison shows that our model shows a significant improvement in the quality of reconstruction of CT projections.
  • ...and 6 more figures