Table of Contents
Fetching ...

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

Hanxin Zhu, Tianyu He, Xin Li, Bingchen Li, Zhibo Chen

TL;DR

This paper proposes the multi-input MLP (mi-MLP) that incorpo-rates the inputs of the vanilla MLP into each layer to prevent the overfitting issue without harming detailed synthesis and proposes to model colors and volume density separately and present two regularization terms.

Abstract

Neural Radiance Field (NeRF) has achieved superior performance for novel view synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a volume rendering procedure, however, when fewer known views are given (i.e., few-shot view synthesis), the model is prone to overfit the given views. To handle this issue, previous efforts have been made towards leveraging learned priors or introducing additional regularizations. In contrast, in this paper, we for the first time provide an orthogonal method from the perspective of network structure. Given the observation that trivially reducing the number of model parameters alleviates the overfitting issue, but at the cost of missing details, we propose the multi-input MLP (mi-MLP) that incorporates the inputs (i.e., location and viewing direction) of the vanilla MLP into each layer to prevent the overfitting issue without harming detailed synthesis. To further reduce the artifacts, we propose to model colors and volume density separately and present two regularization terms. Extensive experiments on multiple datasets demonstrate that: 1) although the proposed mi-MLP is easy to implement, it is surprisingly effective as it boosts the PSNR of the baseline from $14.73$ to $24.23$. 2) the overall framework achieves state-of-the-art results on a wide range of benchmarks. We will release the code upon publication.

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

TL;DR

This paper proposes the multi-input MLP (mi-MLP) that incorpo-rates the inputs of the vanilla MLP into each layer to prevent the overfitting issue without harming detailed synthesis and proposes to model colors and volume density separately and present two regularization terms.

Abstract

Neural Radiance Field (NeRF) has achieved superior performance for novel view synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a volume rendering procedure, however, when fewer known views are given (i.e., few-shot view synthesis), the model is prone to overfit the given views. To handle this issue, previous efforts have been made towards leveraging learned priors or introducing additional regularizations. In contrast, in this paper, we for the first time provide an orthogonal method from the perspective of network structure. Given the observation that trivially reducing the number of model parameters alleviates the overfitting issue, but at the cost of missing details, we propose the multi-input MLP (mi-MLP) that incorporates the inputs (i.e., location and viewing direction) of the vanilla MLP into each layer to prevent the overfitting issue without harming detailed synthesis. To further reduce the artifacts, we propose to model colors and volume density separately and present two regularization terms. Extensive experiments on multiple datasets demonstrate that: 1) although the proposed mi-MLP is easy to implement, it is surprisingly effective as it boosts the PSNR of the baseline from to . 2) the overall framework achieves state-of-the-art results on a wide range of benchmarks. We will release the code upon publication.
Paper Structure (24 sections, 12 equations, 8 figures, 5 tables)

This paper contains 24 sections, 12 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of vanilla MLP vs. mi-MLP. Although mi-MLP is easy to implement, it is surprisingly effective as it boosts the PSNR of the baseline from $14.73$ to $24.23$.
  • Figure 2: Network structure of our proposed method. To avoid the overfitting issue in few-shot view synthesis, we propose multi-input MLP (mi-MLP) that incorporates inputs (i.e., location $(x,y,z)$ and viewing direction $(d_x,d_y,d_z)$) into each layer of the MLP (Sec. \ref{['sec:Per-layer Inputs Incorporation']}). To further improve geometry recovery, we model volume density and colors separately with different frequencies (Sec. \ref{['sec:Modeling Colors and Volume Density Separately']}).
  • Figure 3: Illustration of the averaged amplitude of gradients of each layer in MLP at the beginning of training. (a) All layers in vanilla MLP have a similar amplitude of gradients. (b) In contrast, mi-MLP enables that the deeper layers (i.e., layers close to the outputs) are updated with large gradients while the shallower layers are updated with extremely small ones.
  • Figure 4: Background regularization. In addition to sampling target pixels within the image space (i.e., the red dots) to generate training rays, we also sample target pixels outside the image space (i.e., the blue dots) to address background artifacts in object-centric scenes.
  • Figure 5: Sampling annealing. During the early stage of training, fewer points are sampled along a ray to make the network more focused on coarse geometry estimation, while more sampling points are utilized during the later stage for details recovery.
  • ...and 3 more figures