NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods
Jonas Kulhanek, Torsten Sattler
TL;DR
Novel view synthesis methods (NeRFs and 3D Gaussian Splatting) face challenges from nonuniform evaluation protocols that impede fair progress. NerfBaselines provides a standardized, reproducible evaluation framework with wrappers around official code, unified datasets, and a shared protocol, plus an online benchmark and interactive viewer. Through reproducing published results and cross-dataset analyses (e.g., Mip-NeRF360, Blender, Tanks & Temples), the work demonstrates that small protocol shifts can invert method rankings, underscoring the need for consistent benchmarking. The framework lowers adoption barriers, enabling robust, scalable comparisons across diverse methods and datasets, and advancing reliable progress in novel view synthesis.
Abstract
Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and robotic simulations. With the recent rapid development of Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) methods, it is becoming difficult to keep track of the current state of the art (SoTA) due to methods using different evaluation protocols, codebases being difficult to install and use, and methods not generalizing well to novel 3D scenes. In our experiments, we show that even tiny differences in the evaluation protocols of various methods can artificially boost the performance of these methods. This raises questions about the validity of quantitative comparisons performed in the literature. To address these questions, we propose NerfBaselines, an evaluation framework which provides consistent benchmarking tools, ensures reproducibility, and simplifies the installation and use of various methods. We validate our implementation experimentally by reproducing the numbers reported in the original papers. For improved accessibility, we release a web platform that compares commonly used methods on standard benchmarks. We strongly believe NerfBaselines is a valuable contribution to the community as it ensures that quantitative results are comparable and thus truly measure progress in the field of novel view synthesis.
