Table of Contents
Fetching ...

Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis

Ziqian Ni, Sicong Du, Zhenghua Hou, Chenming Wu, Sheng Yang

TL;DR

Para-Lane addresses the gap in real-world cross-lane novel view synthesis benchmarking by constructing a real multi-lane dataset with synchronized LiDAR and camera data across parallel lanes. A two-phase pose optimization aligns multi-modal data: first building a LiDAR map from multi-pass scans, then registering camera frames to this map for robust cross-modal alignment. The authors benchmark NeRF and 3DGS-based methods on the dataset, revealing that training view distribution and real-world domain gaps significantly impact NVS quality, with Scaffold-GS performing best in this setting. The dataset provides extensive front- and surround-view imagery, dynamic-object labels, and plans for ongoing expansion to support robust evaluation and closed-loop autonomous driving simulations.

Abstract

To evaluate end-to-end autonomous driving systems, a simulation environment based on Novel View Synthesis (NVS) techniques is essential, which synthesizes photo-realistic images and point clouds from previously recorded sequences under new vehicle poses, particularly in cross-lane scenarios. Therefore, the development of a multi-lane dataset and benchmark is necessary. While recent synthetic scene-based NVS datasets have been prepared for cross-lane benchmarking, they still lack the realism of captured images and point clouds. To further assess the performance of existing methods based on NeRF and 3DGS, we present the first multi-lane dataset registering parallel scans specifically for novel driving view synthesis dataset derived from real-world scans, comprising 25 groups of associated sequences, including 16,000 front-view images, 64,000 surround-view images, and 16,000 LiDAR frames. All frames are labeled to differentiate moving objects from static elements. Using this dataset, we evaluate the performance of existing approaches in various testing scenarios at different lanes and distances. Additionally, our method provides the solution for solving and assessing the quality of multi-sensor poses for multi-modal data alignment for curating such a dataset in real-world. We plan to continually add new sequences to test the generalization of existing methods across different scenarios. The dataset is released publicly at the project page: https://nizqleo.github.io/paralane-dataset/.

Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis

TL;DR

Para-Lane addresses the gap in real-world cross-lane novel view synthesis benchmarking by constructing a real multi-lane dataset with synchronized LiDAR and camera data across parallel lanes. A two-phase pose optimization aligns multi-modal data: first building a LiDAR map from multi-pass scans, then registering camera frames to this map for robust cross-modal alignment. The authors benchmark NeRF and 3DGS-based methods on the dataset, revealing that training view distribution and real-world domain gaps significantly impact NVS quality, with Scaffold-GS performing best in this setting. The dataset provides extensive front- and surround-view imagery, dynamic-object labels, and plans for ongoing expansion to support robust evaluation and closed-loop autonomous driving simulations.

Abstract

To evaluate end-to-end autonomous driving systems, a simulation environment based on Novel View Synthesis (NVS) techniques is essential, which synthesizes photo-realistic images and point clouds from previously recorded sequences under new vehicle poses, particularly in cross-lane scenarios. Therefore, the development of a multi-lane dataset and benchmark is necessary. While recent synthetic scene-based NVS datasets have been prepared for cross-lane benchmarking, they still lack the realism of captured images and point clouds. To further assess the performance of existing methods based on NeRF and 3DGS, we present the first multi-lane dataset registering parallel scans specifically for novel driving view synthesis dataset derived from real-world scans, comprising 25 groups of associated sequences, including 16,000 front-view images, 64,000 surround-view images, and 16,000 LiDAR frames. All frames are labeled to differentiate moving objects from static elements. Using this dataset, we evaluate the performance of existing approaches in various testing scenarios at different lanes and distances. Additionally, our method provides the solution for solving and assessing the quality of multi-sensor poses for multi-modal data alignment for curating such a dataset in real-world. We plan to continually add new sequences to test the generalization of existing methods across different scenarios. The dataset is released publicly at the project page: https://nizqleo.github.io/paralane-dataset/.

Paper Structure

This paper contains 16 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Our work introduces the first real-world multi-lane dataset for evaluating the novel view synthesis capabilities of recent reconstruction approaches for autonomous driving. Public urban roads are scanned using multi-pass trajectories with three laser scanners, a front-view camera, and four surround-view cameras. Frame-wise poses are accurately aligned through LiDAR mapping and multi-modal Structure-from-Motion techniques. Here, we present example images captured from close positions in three aligned cross-lane sequences, with a shared point cloud projected onto the images based on our optimized camera-LiDAR poses.
  • Figure 2: Sensor assembly and sample frames of our data collection unmanned vehicle, the right fisheye camera is mounted symmetrically opposite to the left fisheye, and the back fisheye is located at the center of the back-side.
  • Figure 3: LiDAR map stitching quality visualized in both 20cm periodical height ramp in rainbow (left columns) and 10cm cividis colormap reflecting distance with their reconstructed mesh (the right column). Both the error map and zoomed-in views reflect that these refined LiDAR frame poses (the second row), compared to the initial RTK trajectory (the first row), have achieved a thinner stitched cloud with fewer hovering noisy points due to better frame poses.
  • Figure 4: Factors used in our cross-modal pose optimization framework, and we visualize LiDAR-camera alignment quality through an alpha-blending of the colorized intensity map onto its corresponding camera frame. We refer readers to our supplementary video for the data alignment quality of our LiDAR map and multiple camera frames.
  • Figure 5: Five evaluation tracks using different combinations of lanes for training (colored in blue) and testing (colored in red).
  • ...and 2 more figures