Table of Contents
Fetching ...

FewShotNeRF: Meta-Learning-based Novel View Synthesis for Rapid Scene-Specific Adaptation

Piraveen Sivakumar, Paul Janson, Jathushan Rajasegaran, Thanuja Ambegoda

TL;DR

FewShotNeRF tackles the challenge of synthesizing novel views from limited multi-view images by meta-learning a NeRF initialization that can rapidly adapt to a new scene. It leverages gradient-based meta-learning (Reptile) together with hash-encoded representations to distill a robust 3D prior from many scenes, enabling efficient inner-loop NeRF optimization with as few as 2–6 views. Evaluated on the CO3D dataset with real-world objects, it demonstrates competitive performance against strong baselines and affirms the viability of 3D priors learned via meta-learning without external priors. The approach reduces data and compute demands for per-scene NeRF fitting, enabling scalable, rapid scene-specific view synthesis in practical settings.

Abstract

In this paper, we address the challenge of generating novel views of real-world objects with limited multi-view images through our proposed approach, FewShotNeRF. Our method utilizes meta-learning to acquire optimal initialization, facilitating rapid adaptation of a Neural Radiance Field (NeRF) to specific scenes. The focus of our meta-learning process is on capturing shared geometry and textures within a category, embedded in the weight initialization. This approach expedites the learning process of NeRFs and leverages recent advancements in positional encodings to reduce the time required for fitting a NeRF to a scene, thereby accelerating the inner loop optimization of meta-learning. Notably, our method enables meta-learning on a large number of 3D scenes to establish a robust 3D prior for various categories. Through extensive evaluations on the Common Objects in 3D open source dataset, we empirically demonstrate the efficacy and potential of meta-learning in generating high-quality novel views of objects.

FewShotNeRF: Meta-Learning-based Novel View Synthesis for Rapid Scene-Specific Adaptation

TL;DR

FewShotNeRF tackles the challenge of synthesizing novel views from limited multi-view images by meta-learning a NeRF initialization that can rapidly adapt to a new scene. It leverages gradient-based meta-learning (Reptile) together with hash-encoded representations to distill a robust 3D prior from many scenes, enabling efficient inner-loop NeRF optimization with as few as 2–6 views. Evaluated on the CO3D dataset with real-world objects, it demonstrates competitive performance against strong baselines and affirms the viability of 3D priors learned via meta-learning without external priors. The approach reduces data and compute demands for per-scene NeRF fitting, enabling scalable, rapid scene-specific view synthesis in practical settings.

Abstract

In this paper, we address the challenge of generating novel views of real-world objects with limited multi-view images through our proposed approach, FewShotNeRF. Our method utilizes meta-learning to acquire optimal initialization, facilitating rapid adaptation of a Neural Radiance Field (NeRF) to specific scenes. The focus of our meta-learning process is on capturing shared geometry and textures within a category, embedded in the weight initialization. This approach expedites the learning process of NeRFs and leverages recent advancements in positional encodings to reduce the time required for fitting a NeRF to a scene, thereby accelerating the inner loop optimization of meta-learning. Notably, our method enables meta-learning on a large number of 3D scenes to establish a robust 3D prior for various categories. Through extensive evaluations on the Common Objects in 3D open source dataset, we empirically demonstrate the efficacy and potential of meta-learning in generating high-quality novel views of objects.
Paper Structure (10 sections, 8 equations, 4 figures, 4 tables)

This paper contains 10 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: FewShot-NeRF: Learning Rich 3D Scenes from Minimal Camera Poses. Conventional NeRF training demands nearly 100 camera poses per scene. Our approach reduces this requirement by harnessing meta-learning to acquire an optimized initialization for NeRF. By incorporating a 3D prior into the parameter initialization, FewShot-NeRF learns a 3D scene with a minimal set of camera poses, effectively reducing frame requirements
  • Figure 2: Method Overview: (Left) Our approach is rooted in the concept of meta-learning for initialization. We dynamically adjust the initialization by shifting it closer to the optimal parameters derived from NeRFs fitted to various scenes within the same category. This update leverages an extensive range of category-related scenes to imbue geometric resemblances into the initialization. (Center) During testing, we employ 2 to 6 images from distinct viewpoints, initiating NeRF fitting with the learned initialization. (Right) The resulting NeRF model facilitates the synthesis of novel views for the depicted scene.
  • Figure 3: Evolution of PSNR Across Meta-Training Iterations. This graph illustrates the progressive increase in Peak Signal-to-Noise Ratio (PSNR) values with the number of meta-training iterations. The study includes an average of 10 scenes per category, highlighting the consistent improvement in image quality achieved through the iterative meta-training process.
  • Figure 4: This sequence of images illustrates the qualitative progress achieved across four categories - hydrant, apple, ball, and donut. Beginning with the training input, followed by novel views generated using only 2, 3, and 6 training images, we witness the model's ability to enhance realism and accuracy in novel view generation.