Table of Contents
Fetching ...

A Spatiotemporal Hand-Eye Calibration for Trajectory Alignment in Visual(-Inertial) Odometry Evaluation

Zichao Shu, Lijun Li, Rui Wang, Zetao Chen

TL;DR

The paper addresses the challenge of evaluating visual(-inertial) odometry by aligning estimated trajectories to ground-truth across time and frames, modeled as a spatiotemporal hand-eye calibration problem. It introduces a loosely-coupled pipeline with three modules: time alignment based on correlation of angular-velocity signals with quadratic refinement, linear spatial calibration using rotationally constrained relative poses and a screw-theory-based robust kernel within a RANSAC framework, and a batch nonlinear refinement that jointly optimizes the time offset and the extrinsic transformation via continuous-time B-splines and an LM optimizer. Key contributions include a robust, pose-only calibration framework that leverages screw theory to stabilize solutions under VO/VIO noise, and extensive ablation and real-data evaluations showing improved accuracy and robustness over state-of-the-art methods. The approach enhances the reliability of VO/VIO evaluations and is applicable to AR/VR and robotics where ground-truth trajectories are measured with high precision but sensor data are noisy or drift-prone, enabling fairer and more accurate comparisons across systems. The work also identifies limitations for long-duration trajectories due to full-trajectory processing and a fixed time-offset assumption, pointing to future work on scalable, drift-aware spatiotemporal calibration.

Abstract

A common prerequisite for evaluating a visual(-inertial) odometry (VO/VIO) algorithm is to align the timestamps and the reference frame of its estimated trajectory with a reference ground-truth derived from a system of superior precision, such as a motion capture system. The trajectory-based alignment, typically modeled as a classic hand-eye calibration, significantly influences the accuracy of evaluation metrics. However, traditional calibration methods are susceptible to the quality of the input poses. Few studies have taken this into account when evaluating VO/VIO trajectories that usually suffer from noise and drift. To fill this gap, we propose a novel spatiotemporal hand-eye calibration algorithm that fully leverages multiple constraints from screw theory for enhanced accuracy and robustness. Experimental results show that our algorithm has better performance and is less noise-prone than state-of-the-art methods.

A Spatiotemporal Hand-Eye Calibration for Trajectory Alignment in Visual(-Inertial) Odometry Evaluation

TL;DR

The paper addresses the challenge of evaluating visual(-inertial) odometry by aligning estimated trajectories to ground-truth across time and frames, modeled as a spatiotemporal hand-eye calibration problem. It introduces a loosely-coupled pipeline with three modules: time alignment based on correlation of angular-velocity signals with quadratic refinement, linear spatial calibration using rotationally constrained relative poses and a screw-theory-based robust kernel within a RANSAC framework, and a batch nonlinear refinement that jointly optimizes the time offset and the extrinsic transformation via continuous-time B-splines and an LM optimizer. Key contributions include a robust, pose-only calibration framework that leverages screw theory to stabilize solutions under VO/VIO noise, and extensive ablation and real-data evaluations showing improved accuracy and robustness over state-of-the-art methods. The approach enhances the reliability of VO/VIO evaluations and is applicable to AR/VR and robotics where ground-truth trajectories are measured with high precision but sensor data are noisy or drift-prone, enabling fairer and more accurate comparisons across systems. The work also identifies limitations for long-duration trajectories due to full-trajectory processing and a fixed time-offset assumption, pointing to future work on scalable, drift-aware spatiotemporal calibration.

Abstract

A common prerequisite for evaluating a visual(-inertial) odometry (VO/VIO) algorithm is to align the timestamps and the reference frame of its estimated trajectory with a reference ground-truth derived from a system of superior precision, such as a motion capture system. The trajectory-based alignment, typically modeled as a classic hand-eye calibration, significantly influences the accuracy of evaluation metrics. However, traditional calibration methods are susceptible to the quality of the input poses. Few studies have taken this into account when evaluating VO/VIO trajectories that usually suffer from noise and drift. To fill this gap, we propose a novel spatiotemporal hand-eye calibration algorithm that fully leverages multiple constraints from screw theory for enhanced accuracy and robustness. Experimental results show that our algorithm has better performance and is less noise-prone than state-of-the-art methods.
Paper Structure (17 sections, 16 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 16 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our spatiotemporal hand-eye calibration platform and the convention of the reference frames. The global frame for MoCap trajectory is denoted as G, and the local frame is referenced to a specific tracking marker indicated as H. The global frame for VIO trajectory is denoted as W, and the local frame E coincides with the IMU body frame. During this process, dashed lines represent transformations that change over time, while solid lines indicate static offsets.
  • Figure 2: Flowchart of the proposed spatiotemporal hand-eye calibration, where the green and blue parallelograms represent the inputs and outputs respectively, and the orange rectangles represent the critical processing steps.
  • Figure 3: Illustration of time alignment. The time offset can be determined by performing a quadratic polynomial curve fitting around the maximum (highlighted in gray) of the correlation function and obtaining the index of the maximum. This method enables synchronization of angular velocity at a finer granularity.
  • Figure 4: Illustration of relative poses construction, different methods are represented by three different lines, and ours shown as the solid green line.
  • Figure 5: Performance comparison of different time alignment methods under various noise levels. We report the mean and standard deviation of the time alignment error.
  • ...and 4 more figures