Table of Contents
Fetching ...

UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation

Mengqi Guo, Chen Li, Hanlin Chen, Gim Hee Lee

TL;DR

This work tackles incremental learning for Neural Implicit Representations (NIRs) to enable streaming-data 3D reconstruction and view synthesis without storing past data. It introduces a self-contained student–teacher framework augmented with a random inquirer and an uncertainty-based filter to perform knowledge distillation from past models onto a current learner. The approach demonstrates strong improvements over baselines on NeRF- and MonoSDF-based tasks, achieving competitive results with batch-trained upper bounds while using minimal memory. By enabling continual learning across large-scale scenes and diverse NIRs, the method offers practical gains for real-world streaming perception systems.

Abstract

Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis. However, they require the images of a scene from different camera views to be available for one-time training. This is expensive especially for scenarios with large-scale scenes and limited data storage. In view of this, we explore the task of incremental learning for NIRs in this work. We design a student-teacher framework to mitigate the catastrophic forgetting problem. Specifically, we iterate the process of using the student as the teacher at the end of each time step and let the teacher guide the training of the student in the next step. As a result, the student network is able to learn new information from the streaming data and retain old knowledge from the teacher network simultaneously. Although intuitive, naively applying the student-teacher pipeline does not work well in our task. Not all information from the teacher network is helpful since it is only trained with the old data. To alleviate this problem, we further introduce a random inquirer and an uncertainty-based filter to filter useful information. Our proposed method is general and thus can be adapted to different implicit representations such as neural radiance field (NeRF) and neural surface field. Extensive experimental results for both 3D reconstruction and novel view synthesis demonstrate the effectiveness of our approach compared to different baselines.

UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation

TL;DR

This work tackles incremental learning for Neural Implicit Representations (NIRs) to enable streaming-data 3D reconstruction and view synthesis without storing past data. It introduces a self-contained student–teacher framework augmented with a random inquirer and an uncertainty-based filter to perform knowledge distillation from past models onto a current learner. The approach demonstrates strong improvements over baselines on NeRF- and MonoSDF-based tasks, achieving competitive results with batch-trained upper bounds while using minimal memory. By enabling continual learning across large-scale scenes and diverse NIRs, the method offers practical gains for real-world streaming perception systems.

Abstract

Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis. However, they require the images of a scene from different camera views to be available for one-time training. This is expensive especially for scenarios with large-scale scenes and limited data storage. In view of this, we explore the task of incremental learning for NIRs in this work. We design a student-teacher framework to mitigate the catastrophic forgetting problem. Specifically, we iterate the process of using the student as the teacher at the end of each time step and let the teacher guide the training of the student in the next step. As a result, the student network is able to learn new information from the streaming data and retain old knowledge from the teacher network simultaneously. Although intuitive, naively applying the student-teacher pipeline does not work well in our task. Not all information from the teacher network is helpful since it is only trained with the old data. To alleviate this problem, we further introduce a random inquirer and an uncertainty-based filter to filter useful information. Our proposed method is general and thus can be adapted to different implicit representations such as neural radiance field (NeRF) and neural surface field. Extensive experimental results for both 3D reconstruction and novel view synthesis demonstrate the effectiveness of our approach compared to different baselines.
Paper Structure (66 sections, 10 equations, 15 figures, 10 tables)

This paper contains 66 sections, 10 equations, 15 figures, 10 tables.

Figures (15)

  • Figure 1: Visualization of the 3D reconstruction by MonoSDF yu2022monosdf and our approach under the incremental setting. MonoSDF fails to reconstruct 3D surface observed at $t=0$ after being trained with new data because of the forgetting problem. In comparison, our approach is able to reconstruct both previously seen and new data.
  • Figure 2: The overall framework of our proposed student-teacher pipeline. At time step $t$, The student network learns simultaneously from the currently available data $\mathcal{D}^t$ and the previously learned knowledge from the teacher network. The input of the teacher network is generated with the random inquirer. The output is filtered with an uncertainty based filter for useful information selection. $V$ denotes the differentiable volume renderer.
  • Figure 3: Qualitative comparison on the ICL-NUIM and Replica datasets. Both 'MonoSDF' and 'Ours' models are incrementally trained on the 10-step training datasets. The red boxes are the previously learned views.
  • Figure 4: Qualitative comparison on the ScanNet and 360Capture datasets. 'NeRF' and 'Ours' models are incrementally trained on the 10-step training datasets. $\mathcal{D}^0,\mathcal{D}^3,\mathcal{D}^6$ denote the results of previous views from each time step test datasets and $\mathcal{D}^9$ is the results of current views from the latest test dataset.
  • Figure 5: Comparison of Ours and NICE-SLAM with different memory usage.
  • ...and 10 more figures