A Spatiotemporal Hand-Eye Calibration for Trajectory Alignment in Visual(-Inertial) Odometry Evaluation
Zichao Shu, Lijun Li, Rui Wang, Zetao Chen
TL;DR
The paper addresses the challenge of evaluating visual(-inertial) odometry by aligning estimated trajectories to ground-truth across time and frames, modeled as a spatiotemporal hand-eye calibration problem. It introduces a loosely-coupled pipeline with three modules: time alignment based on correlation of angular-velocity signals with quadratic refinement, linear spatial calibration using rotationally constrained relative poses and a screw-theory-based robust kernel within a RANSAC framework, and a batch nonlinear refinement that jointly optimizes the time offset and the extrinsic transformation via continuous-time B-splines and an LM optimizer. Key contributions include a robust, pose-only calibration framework that leverages screw theory to stabilize solutions under VO/VIO noise, and extensive ablation and real-data evaluations showing improved accuracy and robustness over state-of-the-art methods. The approach enhances the reliability of VO/VIO evaluations and is applicable to AR/VR and robotics where ground-truth trajectories are measured with high precision but sensor data are noisy or drift-prone, enabling fairer and more accurate comparisons across systems. The work also identifies limitations for long-duration trajectories due to full-trajectory processing and a fixed time-offset assumption, pointing to future work on scalable, drift-aware spatiotemporal calibration.
Abstract
A common prerequisite for evaluating a visual(-inertial) odometry (VO/VIO) algorithm is to align the timestamps and the reference frame of its estimated trajectory with a reference ground-truth derived from a system of superior precision, such as a motion capture system. The trajectory-based alignment, typically modeled as a classic hand-eye calibration, significantly influences the accuracy of evaluation metrics. However, traditional calibration methods are susceptible to the quality of the input poses. Few studies have taken this into account when evaluating VO/VIO trajectories that usually suffer from noise and drift. To fill this gap, we propose a novel spatiotemporal hand-eye calibration algorithm that fully leverages multiple constraints from screw theory for enhanced accuracy and robustness. Experimental results show that our algorithm has better performance and is less noise-prone than state-of-the-art methods.
