Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark
Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu
TL;DR
Stanford-ORB tackles the real-world evaluation gap for object inverse rendering by introducing a dataset with ground-truth 3D scans, HDR multi-view images, and environment lighting for 14 objects across 7 scenes. It defines three evaluation tasks—geometry estimation, novel scene relighting, and novel view synthesis—and provides a full capture, processing, and pose-registration pipeline. A broad set of baselines across material decomposition, NeRF/IDR-based geometry, and single-view intrinsics are benchmarked, revealing that differentiable Monte Carlo renderers like NVDiffRecMC improve relighting and view synthesis, while explicit geometry representations favor precise shape reconstruction. The work releases data, code, and evaluation protocols, enabling rigorous, real-world benchmarking and highlighting remaining gaps in generalizing inverse rendering to complex, real-world lighting.
Abstract
We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark. Recent advances in inverse rendering have enabled a wide range of real-world applications in 3D content generation, moving rapidly from research and commercial use cases to consumer devices. While the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the performance of various inverse rendering methods. Existing real-world datasets typically only consist of the shape and multi-view images of objects, which are not sufficient for evaluating the quality of material recovery and object relighting. Methods capable of recovering material and lighting often resort to synthetic data for quantitative evaluation, which on the other hand does not guarantee generalization to complex real-world environments. We introduce a new dataset of real-world objects captured under a variety of natural scenes with ground-truth 3D scans, multi-view images, and environment lighting. Using this dataset, we establish the first comprehensive real-world evaluation benchmark for object inverse rendering tasks from in-the-wild scenes, and compare the performance of various existing methods.
