Table of Contents
Fetching ...

MultiPlaneNeRF: Neural Radiance Field with Non-Trainable Representation

Dominik Zimny, Artur Kasymov, Adam Kania, Jacek Tabor, Maciej Zięba, Marcin Mazur, Przemysław Spurek

TL;DR

MultiPlaneNeRF addresses NeRF’s per-object training burden and limited generalization by replacing trainable 3D representations with fixed 2D image inputs and a small implicit decoder, learned over a large dataset. The approach achieves competitive view synthesis with far fewer trainable parameters and demonstrates generalization to unseen objects and cross-class interpolation, while enabling an interpretable GAN component (MultiPlaneGAN) for integration with broader generative models. The work highlights a practical path toward scalable, generalizable neural rendering by decoupling representation from the learnable decoder and leveraging fixed image bases. This offers a tractable alternative to full 3D supervision and opens avenues for efficient 3D-aware generation in complex pipelines.

Abstract

NeRF is a popular model that efficiently represents 3D objects from 2D images. However, vanilla NeRF has some important limitations. NeRF must be trained on each object separately. The training time is long since we encode the object's shape and color in neural network weights. Moreover, NeRF does not generalize well to unseen data. In this paper, we present MultiPlaneNeRF -- a model that simultaneously solves the above problems. Our model works directly on 2D images. We project 3D points on 2D images to produce non-trainable representations. The projection step is not parametrized and a very shallow decoder can efficiently process the representation. Furthermore, we can train MultiPlaneNeRF on a large data set and force our implicit decoder to generalize across many objects. Consequently, we can only replace the 2D images (without additional training) to produce a NeRF representation of the new object. In the experimental section, we demonstrate that MultiPlaneNeRF achieves results comparable to state-of-the-art models for synthesizing new views and has generalization properties. Additionally, MultiPlane decoder can be used as a component in large generative models like GANs.

MultiPlaneNeRF: Neural Radiance Field with Non-Trainable Representation

TL;DR

MultiPlaneNeRF addresses NeRF’s per-object training burden and limited generalization by replacing trainable 3D representations with fixed 2D image inputs and a small implicit decoder, learned over a large dataset. The approach achieves competitive view synthesis with far fewer trainable parameters and demonstrates generalization to unseen objects and cross-class interpolation, while enabling an interpretable GAN component (MultiPlaneGAN) for integration with broader generative models. The work highlights a practical path toward scalable, generalizable neural rendering by decoupling representation from the learnable decoder and leveraging fixed image bases. This offers a tractable alternative to full 3D supervision and opens avenues for efficient 3D-aware generation in complex pipelines.

Abstract

NeRF is a popular model that efficiently represents 3D objects from 2D images. However, vanilla NeRF has some important limitations. NeRF must be trained on each object separately. The training time is long since we encode the object's shape and color in neural network weights. Moreover, NeRF does not generalize well to unseen data. In this paper, we present MultiPlaneNeRF -- a model that simultaneously solves the above problems. Our model works directly on 2D images. We project 3D points on 2D images to produce non-trainable representations. The projection step is not parametrized and a very shallow decoder can efficiently process the representation. Furthermore, we can train MultiPlaneNeRF on a large data set and force our implicit decoder to generalize across many objects. Consequently, we can only replace the 2D images (without additional training) to produce a NeRF representation of the new object. In the experimental section, we demonstrate that MultiPlaneNeRF achieves results comparable to state-of-the-art models for synthesizing new views and has generalization properties. Additionally, MultiPlane decoder can be used as a component in large generative models like GANs.
Paper Structure (17 sections, 12 equations, 12 figures, 5 tables)

This paper contains 17 sections, 12 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: In MultiPlaneNeRF approach, we divided 2D training images into two parts. The first one builds a 2D representation and is used as input to a small implicit decoder. The second part is used as a vanilla NeRF training data set. The representation of a 3D object containing $n$ 2D images is part of the architecture. The implicit decoder takes the coordinates of the 3D point $(x,y,z)$ and applies projection on the given 2D images. Then the aggregate information of the projected pixel $Z_{(x,y,z)} \in \mathbb{R}^{5k}$ is used to predict the color RGB and the volume density $\sigma$.
  • Figure 2: Neural implicit representations use fully connected layers with position encoding to represent a scene (a). Explicit voxel grids or hybrid variants using small implicit decoders are fast but scale poorly with resolution (b). Hybrid explicit-implicit TriPlane representation is fast and well scale, but we must train its parameters (c). In Hybrid explicit-implicit MultiPlane representation, we use existing images as a representation and use a small implicit decoder to aggregate information. By ref color, we marked trainable parameters of respected models.
  • Figure 3: Visualization of renders produce by MultiPlaneNeRF on NeRF Synthetic dataset scenes: Lego, Mic, Ship, Hotdog, Drums, Ficus.
  • Figure 4: For input 3D point ${\bf x} =(x,y,z)$ we apply its projection on image $I$ and obtain 2D coordinate $Pr( {\bf x} ,I)$. Then we use linear interpolation of colors from four closes pixel colors $RGB_{i,j}$, $RGB_{i+1,j}$, $RGB_{i,j+1}$, $RGB_{i+1,j+1}$ to the estimated color $RGB_{Pr( {\bf x} ,I)}$ in position $Pr( {\bf x} ,I)$. Position and colors $[ RGB_{Pr( {\bf x} ,I)}, Pr( {\bf x} ,I) ]$ are input to implicit decoder.
  • Figure 5: Visualization of PSNR metric concerning the number of images used from object representations. We train MultiPlaneNeRF for 40k epochs. As we can see, our model obtains better results when we increase the number of images in representations.
  • ...and 7 more figures