Table of Contents
Fetching ...

VBR: A Vision Benchmark in Rome

Leonardo Brizi, Emanuele Giacomini, Luca Di Giammarino, Simone Ferrari, Omar Salem, Lorenzo De Rebotti, Giorgio Grisetti

TL;DR

VBR introduces a Rome-sourced, multi-sensor vision benchmark tailored for SLAM and odometry by providing six synchronized sequences acquired with handheld and car platforms. Ground truth is generated through a LiDAR Bundle Adjustment approach that fuses RTK-GPS priors with LiDAR odometry, achieving about $\pm 3\ \mathrm{cm}$ accuracy over long trajectories, and is validated with a Total Station. The dataset spans urban, garden, indoor, and highway-like scenes, totaling roughly $40\ \mathrm{km}$ of trajectories and $2\ \mathrm{TB}$ of raw data, with training/testing splits and a public evaluation server. Baseline experiments with KISS-ICP, F-LOAM, and ORB-SLAM3 illustrate the strengths of LiDAR-based methods and highlight the challenges of achieving precise global localization in diverse environments. This resource enables robust, fair benchmarking across robotic platforms (quadrupeds, quadrotors, autonomous vehicles) and supports future work in semantics and dense perception alongside odometry and SLAM evaluation.

Abstract

This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divided in training and testing are accessible through our website.

VBR: A Vision Benchmark in Rome

TL;DR

VBR introduces a Rome-sourced, multi-sensor vision benchmark tailored for SLAM and odometry by providing six synchronized sequences acquired with handheld and car platforms. Ground truth is generated through a LiDAR Bundle Adjustment approach that fuses RTK-GPS priors with LiDAR odometry, achieving about accuracy over long trajectories, and is validated with a Total Station. The dataset spans urban, garden, indoor, and highway-like scenes, totaling roughly of trajectories and of raw data, with training/testing splits and a public evaluation server. Baseline experiments with KISS-ICP, F-LOAM, and ORB-SLAM3 illustrate the strengths of LiDAR-based methods and highlight the challenges of achieving precise global localization in diverse environments. This resource enables robust, fair benchmarking across robotic platforms (quadrupeds, quadrotors, autonomous vehicles) and supports future work in semantics and dense perception alongside odometry and SLAM evaluation.

Abstract

This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divided in training and testing are accessible through our website.
Paper Structure (17 sections, 4 equations, 7 figures, 3 tables)

This paper contains 17 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A summary of our dataset. Data illustrating some of the sequences recorded (top). 3D mapping done with of our ground truth (bottom).
  • Figure 2: Comparison between LiDAR clouds attached to ground truth trajectories of KITTI (up) and ours (down). The zoom shows the elevation view.
  • Figure 3: Projection of the KITTI LiDAR point cloud into an image plane (up), projection of our LiDAR into an image plane (down). The many holes of the up image due to uneven distribution of the LiDAR beams and calibration issues make the KITTI LiDAR image unusable for computer vision tasks.
  • Figure 4: Sensor setup and reference frames. Our ground truth is expressed in the LiDAR reference frame $\mathrm{RF_{L}}$. More details can be found in our website and supplementary materials.
  • Figure 5: Number of top 20 most frequent semantic instance for Ciampino (above) and Colosseum (below) sequences. The instances were counted using OneFormer jain2023oneformer over a subset of images for each sequence and excluding the most predominant classes: sky, wall, road, grass, sidewalk, ground.
  • ...and 2 more figures