Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

Seong Hun Lee; Javier Civera

Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

Seong Hun Lee, Javier Civera

TL;DR

This work introduces three robust, decoupled metrics for multiview pose evaluation: TAS for translation accuracy, RAS for rotation accuracy, and PAS as their mean to capture full 6-DOF pose quality. Each metric relies on robust alignment to ground truth, followed by construction of cumulative frequency histograms over carefully chosen thresholds—$d$ sets the translation thresholds via $d=\underset{i}{\mathrm{Q3}}\left(\min_{j\neq i}\|\mathbf{c}_i-\mathbf{c}_j\|\right)$ and rotation thresholds span $0.1^\circ$ to $10^\circ$—with TAS and RAS defined as $\mathrm{TAS}=\frac{1}{100n}\sum f_k$ and $\mathrm{RAS}=\frac{1}{100n}\sum f_k$. By decoupling translation and rotation and using robust registration (PCR-99 variant) as well as robust rotation averaging (sra2), the authors demonstrate that TAS and RAS offer superior robustness to outliers and collinear motion and are insensitive to trajectory length, while PAS provides a stable, single-score summary. Extensive simulations compare these metrics against ATE, DTE, and mAA, highlighting limitations of existing measures and showing the practical advantages of the proposed approach. The work concludes with usage guidelines and discusses limitations related to heuristic thresholds and aggregation choices, underscoring the metrics’ potential to improve reproducibility and interpretability in multiview pose evaluation.

Abstract

We propose three novel metrics for evaluating the accuracy of a set of estimated camera poses given the ground truth: Translation Alignment Score (TAS), Rotation Alignment Score (RAS), and Pose Alignment Score (PAS). The TAS evaluates the translation accuracy independently of the rotations, and the RAS evaluates the rotation accuracy independently of the translations. The PAS is the average of the two scores, evaluating the combined accuracy of both translations and rotations. The TAS is computed in four steps: (1) Find the upper quartile of the closest-pair-distances, $d$. (2) Align the estimated trajectory to the ground truth using a robust registration method. (3) Collect all distance errors and obtain the cumulative frequencies for multiple thresholds ranging from $0.01d$ to $d$ with a resolution $0.01d$. (4) Add up these cumulative frequencies and normalize them such that the theoretical maximum is 1. The TAS has practical advantages over the existing metrics in that (1) it is robust to outliers and collinear motion, and (2) there is no need to adjust parameters on different datasets. The RAS is computed in a similar manner to the TAS and is also shown to be more robust against outliers than the existing rotation metrics. We verify our claims through extensive simulations and provide in-depth discussion of the strengths and weaknesses of the proposed metrics.

Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

TL;DR

sets the translation thresholds via

and rotation thresholds span

—with TAS and RAS defined as

and

. By decoupling translation and rotation and using robust registration (PCR-99 variant) as well as robust rotation averaging (sra2), the authors demonstrate that TAS and RAS offer superior robustness to outliers and collinear motion and are insensitive to trajectory length, while PAS provides a stable, single-score summary. Extensive simulations compare these metrics against ATE, DTE, and mAA, highlighting limitations of existing measures and showing the practical advantages of the proposed approach. The work concludes with usage guidelines and discusses limitations related to heuristic thresholds and aggregation choices, underscoring the metrics’ potential to improve reproducibility and interpretability in multiview pose evaluation.

Abstract

. (2) Align the estimated trajectory to the ground truth using a robust registration method. (3) Collect all distance errors and obtain the cumulative frequencies for multiple thresholds ranging from

with a resolution

. (4) Add up these cumulative frequencies and normalize them such that the theoretical maximum is 1. The TAS has practical advantages over the existing metrics in that (1) it is robust to outliers and collinear motion, and (2) there is no need to adjust parameters on different datasets. The RAS is computed in a similar manner to the TAS and is also shown to be more robust against outliers than the existing rotation metrics. We verify our claims through extensive simulations and provide in-depth discussion of the strengths and weaknesses of the proposed metrics.

Paper Structure (17 sections, 1 equation, 7 figures, 1 table)

This paper contains 17 sections, 1 equation, 7 figures, 1 table.

Introduction
Related Work
Translation Alignment Score (TAS)
Rotation Alignment Score (RAS)
Pose Alignment Score (PAS)
Evaluation
Evaluation of TAS
Robustness to outliers
Robustness to collinear motion
Impact of the trajectory length
Evaluation of RAS
Evaluation of PAS
Sensitivity to translation and rotation noise
Comparing mAA and PAS
Summary of Findings
...and 2 more sections

Figures (7)

Figure 1: [Top] Cumulative frequency histogram of the distance errors between the aligned camera positions and the ground truth. [Bottom] Cumulative frequency histogram of the angular errors between the aligned camera rotations and the ground truth. Note that the cumulative frequencies in these two histograms are not necessarily the same.
Figure 2: [Random translations] Comparison of the four translation metrics under different noise levels and numbers of outliers. The ATE is the most sensitive to outliers: When we add a single outlier in the estimation, it immediately loses its power to discern the varying noise level. The DTE is more robust than the ATE, but the mAA and TAS have stronger discerning power across a wider range of outliers. Comparing the mAA and TAS in terms of the sensitivity to the noise level, the latter is shown to have more consistent sensitivity across the range of outliers. For instance, the range of the mAA at 50 outliers is 74% smaller than that at zero outliers, while it is 51% smaller for the TAS.
Figure 3: [Collinear translations] Comparison of the four translation metrics under different noise levels and numbers of outliers. The ATE and DTE can handle collinear motion only in the absence of outliers: When we add a single outlier in the estimation, both metrics lose their power to discern the varying noise level. The mAA has weak discerning power, with or without outliers. The TAS is the only metric that maintains strong discerning power as the amount of noise and outliers varies.
Figure 4: [Random translations] Comparison of the four translation metrics under different noise levels and trajectory lengths. The DTE and mAA tend to favor larger datasets when the noise level is the same. The ATE and TAS are much less sensitive to the varying trajectory length than the other two, especially at moderate noise levels.
Figure 5: [Rotations] Comparison of the eight rotation metrics under different noise levels and numbers of outliers. The RAS has the strongest discerning power when the amount of noise and outliers varies. In other words, it can discern different noise levels when the number of outliers is fixed, and vice versa.
...and 2 more figures

Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

TL;DR

Abstract

Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)