Table of Contents
Fetching ...

Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions

Yi-Fan Zuo, Wanting Xu, Xia Wang, Yifu Wang, Laurent Kneip

TL;DR

This work addresses robust visual localization under challenging conditions by enabling cross-modal tracking of an event camera against semi-dense 3D priors from either a depth sensor or a prior image-based map. It introduces Signed Time Surface Maps (STSMs) to exploit event polarity and employs a priority-aware occlusion culling strategy via ANNFs, enabling fast, reliable 6-DoF pose tracking. The paper presents two pipelines: Canny-DEVO (local, depth-assisted mapping) and Canny-EVT (global map-based tracking using events), and demonstrates superior performance over purely event-based and RGB-D methods across HDR, motion, and illumination variations. The results highlight the practical viability of cross-modal event-based tracking for real-time localization in robotics and AR, with open-source code to foster adoption. The contributions advance edge-based, semi-dense registration by combining polarity-aware registration, occlusion handling, and cross-modal priors to achieve robust, efficient tracking in challenging environments.

Abstract

Vision-based localization is a cost-effective and thus attractive solution for many intelligent mobile platforms. However, its accuracy and especially robustness still suffer from low illumination conditions, illumination changes, and aggressive motion. Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution, and thus provide an interesting alternative in such challenging scenarios. While purely event-based solutions currently do not yet produce satisfying mapping results, the present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping. The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results. Practically relevant scenarios are given by depth camera-supported tracking or map-based localization with a semi-dense map prior created by a regular image-based visual SLAM or structure-from-motion system. Conventional edge-based 3D-2D alignment is extended by a novel polarity-aware registration that makes use of signed time-surface maps (STSM) obtained from event streams. We furthermore introduce a novel culling strategy for occluded points. Both modifications increase the speed of the tracker and its robustness against occlusions or large view-point variations. The approach is validated on many real datasets covering the above-mentioned challenging conditions, and compared against similar solutions realised with regular cameras.

Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions

TL;DR

This work addresses robust visual localization under challenging conditions by enabling cross-modal tracking of an event camera against semi-dense 3D priors from either a depth sensor or a prior image-based map. It introduces Signed Time Surface Maps (STSMs) to exploit event polarity and employs a priority-aware occlusion culling strategy via ANNFs, enabling fast, reliable 6-DoF pose tracking. The paper presents two pipelines: Canny-DEVO (local, depth-assisted mapping) and Canny-EVT (global map-based tracking using events), and demonstrates superior performance over purely event-based and RGB-D methods across HDR, motion, and illumination variations. The results highlight the practical viability of cross-modal event-based tracking for real-time localization in robotics and AR, with open-source code to foster adoption. The contributions advance edge-based, semi-dense registration by combining polarity-aware registration, occlusion handling, and cross-modal priors to achieve robust, efficient tracking in challenging environments.

Abstract

Vision-based localization is a cost-effective and thus attractive solution for many intelligent mobile platforms. However, its accuracy and especially robustness still suffer from low illumination conditions, illumination changes, and aggressive motion. Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution, and thus provide an interesting alternative in such challenging scenarios. While purely event-based solutions currently do not yet produce satisfying mapping results, the present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping. The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results. Practically relevant scenarios are given by depth camera-supported tracking or map-based localization with a semi-dense map prior created by a regular image-based visual SLAM or structure-from-motion system. Conventional edge-based 3D-2D alignment is extended by a novel polarity-aware registration that makes use of signed time-surface maps (STSM) obtained from event streams. We furthermore introduce a novel culling strategy for occluded points. Both modifications increase the speed of the tracker and its robustness against occlusions or large view-point variations. The approach is validated on many real datasets covering the above-mentioned challenging conditions, and compared against similar solutions realised with regular cameras.
Paper Structure (28 sections, 9 equations, 14 figures, 12 tables)

This paper contains 28 sections, 9 equations, 14 figures, 12 tables.

Figures (14)

  • Figure 1: Overview of our proposedCanny-DEVOvisual odometry pipeline.
  • Figure 2: Overview of our proposedCanny-EVTvisual tracking pipeline.
  • Figure 3: Projection of the gradient vector. Projection of the 2D gradient from an RGB image onto the 3D support plane of that point. The normal vector of the support plane is defined to be parallel to the vector pointing from the camera center to the 3D point.
  • Figure 4: 6-Dof Camera tracking. Note that the depth image indicated in the dashed frame will not be used by the tracking module.
  • Figure 5: Visualization of the quantization used for predicting the polarity of events.
  • ...and 9 more figures