GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

Butian Xiong; Nanjun Zheng; Junhua Liu; Zhen Li

GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

Butian Xiong, Nanjun Zheng, Junhua Liu, Zhen Li

TL;DR

GauU-Scene V2 addresses the problem that existing image-based metrics poorly reflect underlying geometry in large-scale outdoor reconstructions. It introduces a six-scene, city-scale dataset captured with a DJI drone and Zenmuse L1 LiDAR, paired with a simple scale-matching alignment to fuse LiDAR and COLMAP SfM data. The paper benchmarks multiple baselines—Gaussian Splatting, SuGaR, InstantNGP, and NeRFacto—and reveals a consistent mismatch between image-based metrics and true geometric reconstruction, with NeRFacto achieving better Chamfer distances but worse image scores. This work provides a practical, real-world dataset and a coordinate-alignment pipeline that enable robust evaluation of geometry-focused reconstruction methods, with implications for developing more reliable metrics and representations for outdoor scenes.

Abstract

We introduce a novel, multimodal large-scale scene reconstruction benchmark that utilizes newly developed 3D representation approaches: Gaussian Splatting and Neural Radiance Fields (NeRF). Our expansive U-Scene dataset surpasses any previously existing real large-scale outdoor LiDAR and image dataset in both area and point count. GauU-Scene encompasses over 6.5 square kilometers and features a comprehensive RGB dataset coupled with LiDAR ground truth. Additionally, we are the first to propose a LiDAR and image alignment method for a drone-based dataset. Our assessment of GauU-Scene includes a detailed analysis across various novel viewpoints, employing image-based metrics such as SSIM, LPIPS, and PSNR on NeRF and Gaussian Splatting based methods. This analysis reveals contradictory results when applying geometric-based metrics like Chamfer distance. The experimental results on our multimodal dataset highlight the unreliability of current image-based metrics and reveal significant drawbacks in geometric reconstruction using the current Gaussian Splatting-based method, further illustrating the necessity of our dataset for assessing geometry reconstruction tasks. We also provide detailed supplementary information on data collection protocols and make the dataset available on the following anonymous project page

GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

TL;DR

Abstract

Paper Structure (14 sections, 3 equations, 7 figures, 4 tables)

This paper contains 14 sections, 3 equations, 7 figures, 4 tables.

Introduction
Related Work
Large Scale 3D Outdoor Dataset
Gaussian Splatting
Neural Radiance Field
U Scene Dataset
Data Property
Data Format
Data Scale
Data Collection Method
Analysis and Comparison
Experiment and Result
Conclusion
Acknowledgments

Figures (7)

Figure 1: The dataset prepared for input into the neural field and Gaussian Splatting typically consists of camera positions and images in COLMAP format. The Structure from Motion (SfM) algorithm implemented in COLMAP initializes camera positions randomly, which may not align with LiDAR data in WGS 84 coordinates. This discrepancy poses a significant challenge for geometric alignment measurement and multi-modal fusion algorithms. When inputs are in two different coordinate systems, further validation becomes impractical. To address this, we propose a straightforward yet effective method for statistical scale matching to align LiDAR point clouds with camera positions. This approach is crucial for the construction of our dataset. The details of the preprocessing process will be introduced in Section \ref{['method']}.
Figure 2: Our dataset is organized into six primary sections. The first and second scenes, located in the top row of the graph, feature the Modern Building and the Russian Building, respectively. The third and fourth scenes, depicted on the second line of the graph, represent a campus and a college. The last line combines a village and a residence. The dataset was collected using high-precision LiDAR and high-resolution cameras demonstrate its multimodal capabilities. The area it covers exceeds 6.5 km² and includes thousands of aligned images. Both the point cloud and images are aligned in the COLMAP coordinate system.
Figure 3: This figure shows the design of the drone routing path. The white and orange dots represent the positions where the drone took pictures. The overall path for a scene is shown in Graph (a), which is composed of several micro-blocks. One such micro-block, highlighted in orange, is detailed in Graph (a). Zooming into this orange micro-block reveals Figure (b). The total path length of each micro-block is limited by the battery life of the DJI Matrice 300, as well as the power consumption of the LiDAR in windy conditions. For safety reasons, each micro-block typically covers an area of $350\times350$ square meters. Each micro-block has five routing paths, providing different angles for photography, as illustrated in Figure (c). The first routing path offers a Bird's Eye View (BEV), while the subsequent four paths alter the camera's orientation by 45 degrees towards the horizontal plane. These four paths' camera orientations are forward, backward, rightward, and leftward, respectively.
Figure 4: Our dataset provides essential information for quality control and multi-modal analysis and visualization. By using professional tools such as DJI Terra, one can observe three important properties critical for quality control: Reflectivity, Height, and Return. Graph (a) in this figure illustrates reflectivity, which measures the amount of light reflected back to the LiDAR sensor from surfaces or objects. Meanwhile, height, shown in graph (b), represents the building's altitude relative to the drone's takeoff altitude. The return, presented in graph (c), indicates the number of light returns detected by the LiDAR. Since our analysis filters out all data except those with at least two returns, moving objects, represented by red dots, will be excluded. More visualization results can be explored in our dataset or in the supplementary materials.
Figure 5: The orange color indicates the point cloud is in the COLMAP coordinate, and the blue color indicate the point cloud is in WGS coordinate. Before we use the proposed matching algorithms, we need to filter the COLMAP point cloud according to the points' distance to mean, and down sampling the Lidar point cloud. Then resealing lidar point cloud and manually registration or ICP registration
...and 2 more figures

GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

TL;DR

Abstract

GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

Authors

TL;DR

Abstract

Table of Contents

Figures (7)