ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition
Quoc-Anh Bui, Gilles Rougeron, Géraldine Morin, Simone Gasparini
TL;DR
ROI-NeRFs tackle hi-fidelity visualization of objects of interest inside large scenes by decomposing the scene into a global Scene NeRF and multiple ROI NeRFs, each trained on ROI-focused views selected by Object-Focused Camera Grouping. A Ray-level Compositional Rendering pipeline blends ROI detail with global context, using depth-based ray filtering and Uniform Ray Composition to efficiently fuse information across NeRFs. Experiments on the Egypt and Hôtel de la Marine datasets show consistent ROI-quality gains over strong baselines with manageable inference cost, and ablations validate the contributions of ROI Sample Replacement and depth-based ray filtering. This approach enables high-LOD rendering of culturally valuable ROI content without retraining the entire scene, with potential for hierarchical decomposition and object editing in future work.
Abstract
Efficient and accurate 3D reconstruction is essential for applications in cultural heritage. This study addresses the challenge of visualizing objects within large-scale scenes at a high level of detail (LOD) using Neural Radiance Fields (NeRFs). The aim is to improve the visual fidelity of chosen objects while maintaining the efficiency of the computations by focusing on details only for relevant content. The proposed ROI-NeRFs framework divides the scene into a Scene NeRF, which represents the overall scene at moderate detail, and multiple ROI NeRFs that focus on user-defined objects of interest. An object-focused camera selection module automatically groups relevant cameras for each NeRF training during the decomposition phase. In the composition phase, a Ray-level Compositional Rendering technique combines information from the Scene NeRF and ROI NeRFs, allowing simultaneous multi-object rendering composition. Quantitative and qualitative experiments conducted on two real-world datasets, including one on a complex eighteen's century cultural heritage room, demonstrate superior performance compared to baseline methods, improving LOD for object regions, minimizing artifacts, and without significantly increasing inference time.
