Table of Contents
Fetching ...

MomentsNeRF: Leveraging Orthogonal Moments for Few-Shot Neural Rendering

Ahmad AlMughrabi, Ricardo Marques, Petia Radeva

TL;DR

MomentsNeRF addresses the challenge of one- and few-shot neural rendering by embedding orthogonal moment representations into a CNN encoder to produce a rich feature volume for NeRF. By integrating Gabor and Zernike moments and utilizing Zernike convolutional layers, the method enhances texture detail and reduces artifacts in sparse-view rendering, achieving state-of-the-art gains on DTU MVS and ShapeNet while training faster than prior approaches. The approach demonstrates how structured moment-based features can improve cross-view transfer and per-scene optimization, with practical implications for real-time or data-scarce 3D scene synthesis. The work also outlines future directions to extend moment families and robustness to diverse, real-world scenes.

Abstract

We propose MomentsNeRF, a novel framework for one- and few-shot neural rendering that predicts a neural representation of a 3D scene using Orthogonal Moments. Our architecture offers a new transfer learning method to train on multi-scenes and incorporate a per-scene optimization using one or a few images at test time. Our approach is the first to successfully harness features extracted from Gabor and Zernike moments, seamlessly integrating them into the NeRF architecture. We show that MomentsNeRF performs better in synthesizing images with complex textures and shapes, achieving a significant noise reduction, artifact elimination, and completing the missing parts compared to the recent one- and few-shot neural rendering frameworks. Extensive experiments on the DTU and Shapenet datasets show that MomentsNeRF improves the state-of-the-art by {3.39\;dB\;PSNR}, 11.1% SSIM, 17.9% LPIPS, and 8.3% DISTS metrics. Moreover, it outperforms state-of-the-art performance for both novel view synthesis and single-image 3D view reconstruction. The source code is accessible at: https://amughrabi.github.io/momentsnerf/.

MomentsNeRF: Leveraging Orthogonal Moments for Few-Shot Neural Rendering

TL;DR

MomentsNeRF addresses the challenge of one- and few-shot neural rendering by embedding orthogonal moment representations into a CNN encoder to produce a rich feature volume for NeRF. By integrating Gabor and Zernike moments and utilizing Zernike convolutional layers, the method enhances texture detail and reduces artifacts in sparse-view rendering, achieving state-of-the-art gains on DTU MVS and ShapeNet while training faster than prior approaches. The approach demonstrates how structured moment-based features can improve cross-view transfer and per-scene optimization, with practical implications for real-time or data-scarce 3D scene synthesis. The work also outlines future directions to extend moment families and robustness to diverse, real-world scenes.

Abstract

We propose MomentsNeRF, a novel framework for one- and few-shot neural rendering that predicts a neural representation of a 3D scene using Orthogonal Moments. Our architecture offers a new transfer learning method to train on multi-scenes and incorporate a per-scene optimization using one or a few images at test time. Our approach is the first to successfully harness features extracted from Gabor and Zernike moments, seamlessly integrating them into the NeRF architecture. We show that MomentsNeRF performs better in synthesizing images with complex textures and shapes, achieving a significant noise reduction, artifact elimination, and completing the missing parts compared to the recent one- and few-shot neural rendering frameworks. Extensive experiments on the DTU and Shapenet datasets show that MomentsNeRF improves the state-of-the-art by {3.39\;dB\;PSNR}, 11.1% SSIM, 17.9% LPIPS, and 8.3% DISTS metrics. Moreover, it outperforms state-of-the-art performance for both novel view synthesis and single-image 3D view reconstruction. The source code is accessible at: https://amughrabi.github.io/momentsnerf/.
Paper Structure (29 sections, 21 equations, 19 figures, 5 tables)

This paper contains 29 sections, 21 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Qualitative comparison on DTU dataset using 3 views settings. We show novel views rendered by PixelNeRF, Ours, and the Reference image. Our model performs better by showing the texture details, recovering artifacts, handling missing data, and better color adjustment.
  • Figure 2: The MomentsNeRF design for the multi-view scenario involves a multi-stage process. (a) Cost Volume: given a query point $x$ on a target camera ray with a specified view direction $d$, an image feature corresponding to this point is extracted from the feature volume $W^i$ through projection and interpolation operations. The Moments CNN encoder constructs the feature volume for every input during the (b) Moments Neural Encoding Volume stage. Subsequently, this feature is fed into the NeRF network MLP together with its spatial coordinates. The output of this network is represented by RGB and density values that are utilized in a (c) Volume Rendering stage, resulting in a synthesized image.
  • Figure 3: Even (on the left) and Odd (on the right) 2D (up) & 3D (down) plots for four Zernike polynomials, also labeled with their classical names. The complete set of Zernike Polynomials (i.e., 15 Polynomials) can be found in the supplements.
  • Figure 4: Detailed illustration of the Moments CNN Encoder.
  • Figure 5: Comparison to the Reference and PixelNeRF for 1 & 9 views settings on the DTU dataset.
  • ...and 14 more figures