Knowledge NeRF: Few-shot Novel View Synthesis for Dynamic Articulated Objects

Wenxiao Cai; Xinyue Lei; Xinyu He; Junming Leo Chen; Yangang Wang

Knowledge NeRF: Few-shot Novel View Synthesis for Dynamic Articulated Objects

Wenxiao Cai, Xinyue Lei, Xinyu He, Junming Leo Chen, Yangang Wang

TL;DR

Knowledge NeRF addresses sparse-view novel view synthesis for dynamic articulated objects by transferring knowledge from a pretrained NeRF (the knowledge base) to a new state through a lightweight projection module $F_P$, paired with a frozen NeRF $F_ heta$. Framed as a two-frame problem, the method trains on 5 input images per state to render high-fidelity views via volume rendering using projected coordinates $F_P( extbf{r}_1)$. Key contributions include the lightweight projection module, an end-to-end training pipeline with staged NeRF pretraining and fine-tuning, and strong empirical results on synthetic and real-world datasets showing image quality competitive with NeRFs trained on much larger datasets. This approach enables efficient dynamic 3D reconstruction with minimal data, with broad potential impact on AR/VR, 3D content production, and robotics.

Abstract

We present Knowledge NeRF to synthesize novel views for dynamic scenes. Reconstructing dynamic 3D scenes from few sparse views and rendering them from arbitrary perspectives is a challenging problem with applications in various domains. Previous dynamic NeRF methods learn the deformation of articulated objects from monocular videos. However, qualities of their reconstructed scenes are limited. To clearly reconstruct dynamic scenes, we propose a new framework by considering two frames at a time.We pretrain a NeRF model for an articulated object.When articulated objects moves, Knowledge NeRF learns to generate novel views at the new state by incorporating past knowledge in the pretrained NeRF model with minimal observations in the present state. We propose a projection module to adapt NeRF for dynamic scenes, learning the correspondence between pretrained knowledge base and current states. Experimental results demonstrate the effectiveness of our method in reconstructing dynamic 3D scenes with 5 input images in one state. Knowledge NeRF is a new pipeline and promising solution for novel view synthesis in dynamic articulated objects. The data and implementation are publicly available at https://github.com/RussRobin/Knowledge_NeRF.

Knowledge NeRF: Few-shot Novel View Synthesis for Dynamic Articulated Objects

TL;DR

, paired with a frozen NeRF

. Framed as a two-frame problem, the method trains on 5 input images per state to render high-fidelity views via volume rendering using projected coordinates

. Key contributions include the lightweight projection module, an end-to-end training pipeline with staged NeRF pretraining and fine-tuning, and strong empirical results on synthetic and real-world datasets showing image quality competitive with NeRFs trained on much larger datasets. This approach enables efficient dynamic 3D reconstruction with minimal data, with broad potential impact on AR/VR, 3D content production, and robotics.

Abstract

Paper Structure (18 sections, 7 equations, 7 figures, 2 tables)

This paper contains 18 sections, 7 equations, 7 figures, 2 tables.

Introduction
Related work
Dynamic NeRF
Sparse View Reconstruction and Few-shot NeRF
Problem formulation
The proposed method
Lightweight projection module
Training pipeline
Loss function
Experiments
Implementation details
Datasets and Baseline methods
Qualitative comparison
Quantitative comparison
Ablation studies
...and 3 more sections

Figures (7)

Figure 1: We propose a brand new framework, Knowledge NeRF, for dynamic novel view synthesis of articulated objects on. From a pretrained NeRF model in one frame, we train each following frame with 5 input images separately. The image quality of Knowledge NeRF is as clear as NeRF trained with abundant inputs.
Figure 2: Overview of the proposed problem. Regardless of the articulated object undergoing any rigid transformation, we can train its 3D model in the new state using just five images.
Figure 3: The proposed framework. A projection module is adopted to estimate the transformation between the original space and the deformed space. Within the deformed space, to render a specific pixel, we trace rays originating from the camera in the specified direction. To calculate the color and opacity of the points traversed by the rays, we utilize the projection module to find the corresponding point in the original space. Subsequently, a pre-trained NeRF model in the original space is utilized to ascertain the color and opacity of the identified point.
Figure 4: Qualitative results on 8 datasets. We compare sparse view reconstruction results of the proposed method. We also show the knowledge adopted in our method. Please note that views in the knowledge base and current state are randomly sampled.
Figure 5: Quanlitative results on our proposed real-world dataset.
...and 2 more figures

Knowledge NeRF: Few-shot Novel View Synthesis for Dynamic Articulated Objects

TL;DR

Abstract

Knowledge NeRF: Few-shot Novel View Synthesis for Dynamic Articulated Objects

Authors

TL;DR

Abstract

Table of Contents

Figures (7)