HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN
Adam Kania, Artur Kasymov, Jakub Kościukiewicz, Artur Górak, Marcin Mazur, Maciej Zięba, Przemysław Spurek
TL;DR
HyperNeRFGAN introduces a hypernetwork that maps a latent Gaussian vector to the weights of a simplified NeRF $F_{ heta}$ that does not rely on viewing directions, enabling 3D-aware image synthesis trained with a 2D discriminator. The model leverages factorized multiplicative modulation within a NeRF backbone and a reduced coarse network, trained under a StyleGAN2-like objective to render diverse 2D views from 3D objects. Across ShapeNet, CARLA, CelebA, and medical DRR datasets, HyperNeRFGAN achieves competitive or superior performance relative to state-of-the-art 3D-aware methods, with particular strength when camera pose data is unavailable or scarce. The approach offers practical impact for 3D generation in domains such as medical imaging, where acquiring pose information is challenging, by delivering 3D-consistent renderings from unlabeled 2D views.
Abstract
The recent surge in popularity of deep generative models for 3D objects has highlighted the need for more efficient training methods, particularly given the difficulties associated with training with conventional 3D representations, such as voxels or point clouds. Neural Radiance Fields (NeRFs), which provide the current benchmark in terms of quality for the generation of novel views of complex 3D scenes from a limited set of 2D images, represent a promising solution to this challenge. However, the training of these models requires the knowledge of the respective camera positions from which the images were viewed. In this paper, we overcome this limitation by introducing HyperNeRFGAN, a Generative Adversarial Network (GAN) architecture employing a hypernetwork paradigm to transform a Gaussian noise into the weights of a NeRF architecture that does not utilize viewing directions in its training phase. Consequently, as evidenced by the findings of our experimental study, the proposed model, despite its notable simplicity in comparison to existing state-of-the-art alternatives, demonstrates superior performance on a diverse range of image datasets where camera position estimation is challenging, particularly in the context of medical data.
