Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor
Junlin Song, Antoine Richard, Miguel Olivares-Mendez
TL;DR
This work tackles the problem of calibrating the spatial-temporal relationship between a monocular camera and a global pose sensor. It introduces a dual approach: an offline target-based calibration that jointly optimizes camera intrinsics, the camera-to-marker transform ${}_M^C T$, a global pose transform ${}_W^G T$, and the trajectory using a grid of AprilTags, and an online target-less EKF-based method that estimates time-varying spatial-temporal parameters without requiring targets. The authors provide a detailed observability analysis showing full observability under fully excited 6DoF motions and characterize degenerate motions, complemented by simulation studies and real-world hand-held experiments (including sequences with and without targets) that validate accuracy and consistency, as well as the ability to track time-varying parameters. The methods enable accurate evaluation of localization algorithms and broader computer-vision tasks that rely on precise spatial-temporal alignment, even when traditional hand-eye calibration is infeasible. The work also derives analytical on-manifold Jacobians for the target-based method, integrates robust data fusion, and demonstrates practical applicability on public datasets like TUM-VI, with implications for visual SLAM, annotation, and multi-sensor fusion.
Abstract
In robotics, motion capture systems have been widely used to measure the accuracy of localization algorithms. Moreover, this infrastructure can also be used for other computer vision tasks, such as the evaluation of Visual (-Inertial) SLAM dynamic initialization, multi-object tracking, or automatic annotation. Yet, to work optimally, these functionalities require having accurate and reliable spatial-temporal calibration parameters between the camera and the global pose sensor. In this study, we provide two novel solutions to estimate these calibration parameters. Firstly, we design an offline target-based method with high accuracy and consistency. Spatial-temporal parameters, camera intrinsic, and trajectory are optimized simultaneously. Then, we propose an online target-less method, eliminating the need for a calibration target and enabling the estimation of time-varying spatial-temporal parameters. Additionally, we perform detailed observability analysis for the target-less method. Our theoretical findings regarding observability are validated by simulation experiments and provide explainable guidelines for calibration. Finally, the accuracy and consistency of two proposed methods are evaluated with hand-held real-world datasets where traditional hand-eye calibration method do not work.
