Table of Contents
Fetching ...

NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation

Mingyu Jeong, Eunsung Kim, Sehun Park, Andrew Jaeyong Choi

TL;DR

NVSim addresses the scalability and realism gap in indoor VLN simulators by automatically building navigable environments from traversal image sequences. It advances a two-stage approach: (i) scalable 3D scene representation via submaps using Floor-Aware Gaussian Splatting to suppress floor artifacts, and (ii) mesh-free traversability to construct a topological graph G=(V,E) directly from rendered views. Its contributions include a Hybrid Floor Segmentation method, a spherical-harmonics background for floor regions, and a BFS-based topomap generation that yields valid navigable graphs without meshes. Evaluations on the COEX dataset show effective floor artifact removal, robust topological maps, and feasible R2R-style navigation, illustrating NVSim’s potential to enable scalable VLN research and broader embodied AI applications.

Abstract

We present NVSim, a framework that automatically constructs large-scale, navigable indoor simulators from only common image sequences, overcoming the cost and scalability limitations of traditional 3D scanning. Our approach adapts 3D Gaussian Splatting to address visual artifacts on sparsely observed floors a common issue in robotic traversal data. We introduce Floor-Aware Gaussian Splatting to ensure a clean, navigable ground plane, and a novel mesh-free traversability checking algorithm that constructs a topological graph by directly analyzing rendered views. We demonstrate our system's ability to generate valid, large-scale navigation graphs from real-world data. A video demonstration is avilable at https://youtu.be/tTiIQt6nXC8

NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation

TL;DR

NVSim addresses the scalability and realism gap in indoor VLN simulators by automatically building navigable environments from traversal image sequences. It advances a two-stage approach: (i) scalable 3D scene representation via submaps using Floor-Aware Gaussian Splatting to suppress floor artifacts, and (ii) mesh-free traversability to construct a topological graph G=(V,E) directly from rendered views. Its contributions include a Hybrid Floor Segmentation method, a spherical-harmonics background for floor regions, and a BFS-based topomap generation that yields valid navigable graphs without meshes. Evaluations on the COEX dataset show effective floor artifact removal, robust topological maps, and feasible R2R-style navigation, illustrating NVSim’s potential to enable scalable VLN research and broader embodied AI applications.

Abstract

We present NVSim, a framework that automatically constructs large-scale, navigable indoor simulators from only common image sequences, overcoming the cost and scalability limitations of traditional 3D scanning. Our approach adapts 3D Gaussian Splatting to address visual artifacts on sparsely observed floors a common issue in robotic traversal data. We introduce Floor-Aware Gaussian Splatting to ensure a clean, navigable ground plane, and a novel mesh-free traversability checking algorithm that constructs a topological graph by directly analyzing rendered views. We demonstrate our system's ability to generate valid, large-scale navigation graphs from real-world data. A video demonstration is avilable at https://youtu.be/tTiIQt6nXC8

Paper Structure

This paper contains 18 sections, 8 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: The left panel shows the viewpoints captured along a real-world robot trajectory. The right panel displays the dense graph of traversable viewpoints automatically generated by our method.
  • Figure 2: An overview of the NVSim framework. Given an RGB image sequence and camera poses, we first cluster the trajectory into submaps. For each submap, we generate robust floor masks using our hybrid segmentation method and then reconstruct the scene with Floor-Aware Gaussian Splatting. Finally, a mesh-free topological map is automatically generated from this collection of geometric cues such as alpha map and surface normals.
  • Figure 3: Results comparing the masks from the hybrid floor segmentation process; (d) shows red points sampled from $M_{\text{cand}}$.
  • Figure 4: Quantitative comparison of novel view scene representations across different submaps.
  • Figure 5: Qualitative results from the ablation study on our navigation-specific scene representation.
  • ...and 5 more figures