Table of Contents
Fetching ...

MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework

Xiangcheng Hu, Jin Wu, Mingkai Jia, Hongyu Yan, Yi Jiang, Binqian Jiang, Wei Zhang, Wei He, Ping Tan

TL;DR

MapEval tackles the challenge of evaluating massive SLAM maps by proposing a unified framework that jointly assesses global geometry and local consistency. It introduces two complementary metrics, AWD and SCS, derived from voxel-wise Gaussian representations and Wasserstein distances to achieve robustness and scalability. Extensive experiments show 100-500× speedups over traditional metrics while preserving evaluation fidelity across simulated and real-world datasets, and reveal useful trade-offs between global accuracy and local consistency. The framework and open-source release aim to standardize SLAM map evaluation in robotics, enabling fair comparisons and robust quality assessment in diverse environments.

Abstract

Evaluating massive-scale point cloud maps in Simultaneous Localization and Mapping (SLAM) remains challenging, primarily due to the absence of unified, robust and efficient evaluation frameworks. We present MapEval, an open-source framework for comprehensive quality assessment of point cloud maps, specifically addressing SLAM scenarios where ground truth map is inherently sparse compared to the mapped environment. Through systematic analysis of existing evaluation metrics in SLAM applications, we identify their fundamental limitations and establish clear guidelines for consistent map quality assessment. Building upon these insights, we propose a novel Gaussian-approximated Wasserstein distance in voxelized space, enabling two complementary metrics under the same error standard: Voxelized Average Wasserstein Distance (AWD) for global geometric accuracy and Spatial Consistency Score (SCS) for local consistency evaluation. This theoretical foundation leads to significant improvements in both robustness against noise and computational efficiency compared to conventional metrics. Extensive experiments on both simulated and real-world datasets demonstrate that MapEval achieves at least \SI{100}{}-\SI{500}{} times faster while maintaining evaluation integrity. The MapEval library\footnote{\texttt{https://github.com/JokerJohn/Cloud\_Map\_Evaluation}} will be publicly available to promote standardized map evaluation practices in the robotics community.

MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework

TL;DR

MapEval tackles the challenge of evaluating massive SLAM maps by proposing a unified framework that jointly assesses global geometry and local consistency. It introduces two complementary metrics, AWD and SCS, derived from voxel-wise Gaussian representations and Wasserstein distances to achieve robustness and scalability. Extensive experiments show 100-500× speedups over traditional metrics while preserving evaluation fidelity across simulated and real-world datasets, and reveal useful trade-offs between global accuracy and local consistency. The framework and open-source release aim to standardize SLAM map evaluation in robotics, enabling fair comparisons and robust quality assessment in diverse environments.

Abstract

Evaluating massive-scale point cloud maps in Simultaneous Localization and Mapping (SLAM) remains challenging, primarily due to the absence of unified, robust and efficient evaluation frameworks. We present MapEval, an open-source framework for comprehensive quality assessment of point cloud maps, specifically addressing SLAM scenarios where ground truth map is inherently sparse compared to the mapped environment. Through systematic analysis of existing evaluation metrics in SLAM applications, we identify their fundamental limitations and establish clear guidelines for consistent map quality assessment. Building upon these insights, we propose a novel Gaussian-approximated Wasserstein distance in voxelized space, enabling two complementary metrics under the same error standard: Voxelized Average Wasserstein Distance (AWD) for global geometric accuracy and Spatial Consistency Score (SCS) for local consistency evaluation. This theoretical foundation leads to significant improvements in both robustness against noise and computational efficiency compared to conventional metrics. Extensive experiments on both simulated and real-world datasets demonstrate that MapEval achieves at least \SI{100}{}-\SI{500}{} times faster while maintaining evaluation integrity. The MapEval library\footnote{\texttt{https://github.com/JokerJohn/Cloud\_Map\_Evaluation}} will be publicly available to promote standardized map evaluation practices in the robotics community.

Paper Structure

This paper contains 37 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Mapping evaluation for PALoc on sequence S1. Left: Full error map visualization with regions A and B highlighted. Right: Zoomed views of the highlighted regions. The colormap represents geometric error (in cm), ranging from low (blue) to high (red).
  • Figure 2: The MapEval pipeline (Section \ref{['sub:method_eval_pipilie']}). The framework first acquires dense point cloud maps from both ground truth sensor and SLAM algorithms (left), performs dense map alignment with an initial pose estimate (middle), and evaluates mapping quality through geometric error and local consistency metrics (right).
  • Figure 3: (a) Multi-sensor data platform. (b) Leica RTC$360$ scanner employed for ground truth map collection.
  • Figure 4: Comparison of evaluation metrics on S1 ground truth map with varying Gaussian noise range (100-1000000cm) applied to 0.1% randomly sampled points (Table \ref{['tab:noise_evaluation']}). While CD exhibits high sensitivity to outliers, the proposed AWD shows superior robustness across different noise scale.
  • Figure 5: Comparative evaluation of FL2 (row A) and PALoc (row B) on S14 (Table \ref{['tab:comprehensive_results']}). From left to right: (1) geometric error AC visualization (blue: 0cm to red: 20cm); (2) voxel-wise error distribution; (3) CDF and 3$\sigma$ bound analysis; (4) SCS visualization. Despite significant global drift errors, CD remains nearly constant, while PALoc demonstrates superior performance in both AWD and SCS compared to FL2.
  • ...and 2 more figures