Table of Contents
Fetching ...

ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition

Quoc-Anh Bui, Gilles Rougeron, Géraldine Morin, Simone Gasparini

TL;DR

ROI-NeRFs tackle hi-fidelity visualization of objects of interest inside large scenes by decomposing the scene into a global Scene NeRF and multiple ROI NeRFs, each trained on ROI-focused views selected by Object-Focused Camera Grouping. A Ray-level Compositional Rendering pipeline blends ROI detail with global context, using depth-based ray filtering and Uniform Ray Composition to efficiently fuse information across NeRFs. Experiments on the Egypt and Hôtel de la Marine datasets show consistent ROI-quality gains over strong baselines with manageable inference cost, and ablations validate the contributions of ROI Sample Replacement and depth-based ray filtering. This approach enables high-LOD rendering of culturally valuable ROI content without retraining the entire scene, with potential for hierarchical decomposition and object editing in future work.

Abstract

Efficient and accurate 3D reconstruction is essential for applications in cultural heritage. This study addresses the challenge of visualizing objects within large-scale scenes at a high level of detail (LOD) using Neural Radiance Fields (NeRFs). The aim is to improve the visual fidelity of chosen objects while maintaining the efficiency of the computations by focusing on details only for relevant content. The proposed ROI-NeRFs framework divides the scene into a Scene NeRF, which represents the overall scene at moderate detail, and multiple ROI NeRFs that focus on user-defined objects of interest. An object-focused camera selection module automatically groups relevant cameras for each NeRF training during the decomposition phase. In the composition phase, a Ray-level Compositional Rendering technique combines information from the Scene NeRF and ROI NeRFs, allowing simultaneous multi-object rendering composition. Quantitative and qualitative experiments conducted on two real-world datasets, including one on a complex eighteen's century cultural heritage room, demonstrate superior performance compared to baseline methods, improving LOD for object regions, minimizing artifacts, and without significantly increasing inference time.

ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition

TL;DR

ROI-NeRFs tackle hi-fidelity visualization of objects of interest inside large scenes by decomposing the scene into a global Scene NeRF and multiple ROI NeRFs, each trained on ROI-focused views selected by Object-Focused Camera Grouping. A Ray-level Compositional Rendering pipeline blends ROI detail with global context, using depth-based ray filtering and Uniform Ray Composition to efficiently fuse information across NeRFs. Experiments on the Egypt and Hôtel de la Marine datasets show consistent ROI-quality gains over strong baselines with manageable inference cost, and ablations validate the contributions of ROI Sample Replacement and depth-based ray filtering. This approach enables high-LOD rendering of culturally valuable ROI content without retraining the entire scene, with potential for hierarchical decomposition and object editing in future work.

Abstract

Efficient and accurate 3D reconstruction is essential for applications in cultural heritage. This study addresses the challenge of visualizing objects within large-scale scenes at a high level of detail (LOD) using Neural Radiance Fields (NeRFs). The aim is to improve the visual fidelity of chosen objects while maintaining the efficiency of the computations by focusing on details only for relevant content. The proposed ROI-NeRFs framework divides the scene into a Scene NeRF, which represents the overall scene at moderate detail, and multiple ROI NeRFs that focus on user-defined objects of interest. An object-focused camera selection module automatically groups relevant cameras for each NeRF training during the decomposition phase. In the composition phase, a Ray-level Compositional Rendering technique combines information from the Scene NeRF and ROI NeRFs, allowing simultaneous multi-object rendering composition. Quantitative and qualitative experiments conducted on two real-world datasets, including one on a complex eighteen's century cultural heritage room, demonstrate superior performance compared to baseline methods, improving LOD for object regions, minimizing artifacts, and without significantly increasing inference time.

Paper Structure

This paper contains 23 sections, 2 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: An overview of the proposed Region of Interest-Focused NeRFs (ROI-NeRFs) framework, consisting of two steps: scene decomposition and composition. In decomposition, the scene is divided into Scene and ROIs groups, with camera sets automatically selected for each NeRF training. Composition stage integrates high-detail ROI NeRFs and the global Scene NeRF to produce high-quality renderings with enhanced detail for objects in the ROIs.
  • Figure 2: The sparse 3D point cloud and the camera positions (in blue) estimated by SfM from the Egypt dataset.
  • Figure 3: An example of a ROI: a table and surrounding chairs within the green AABB, manually selected from the sparse point cloud. Cameras in green focus on this ROI; they are automatically selected by the criterion of observing at least 10% of the 3D points inside the ROI. The green cameras are used to train the ROI NeRF, while the remaining blue cameras train the Scene NeRF. For clarity, only a single ROI is shown in this example.
  • Figure 4: A render of the object of interest (in the green box) and the surrounding scene from the corresponding ROI NeRF. The object is learned in greater detail, while the surrounding and background regions show a noticeably lower quality with floaters and missing geometry.
  • Figure 5: Ray-level Composition. (a) For a given ray passing through the ROI, (b) the sampled points from the ROI and Scene NeRF differ. Since both rays are within the same normalized space, the bounding intervals on the rays inside the AABB are identical, represented by the green boxes. (c) We replace all Scene points within the AABB with points sampled from the ROI NeRF to obtain the composed ray.
  • ...and 11 more figures