Table of Contents
Fetching ...

Event-based Motion & Appearance Fusion for 6D Object Pose Tracking

Zhichao Li, Chiara Bartolozzi, Lorenzo Natale, Arren Glover

TL;DR

This work uses 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction.

Abstract

Object pose tracking is a fundamental and essential task for robotics to perform tasks in the home and industrial settings. The most commonly used sensors to do so are RGB-D cameras, which can hit limitations in highly dynamic environments due to motion blur and frame-rate constraints. Event cameras have remarkable features such as high temporal resolution and low latency, which make them a potentially ideal vision sensors for object pose tracking at high speed. Even so, there are still only few works on 6D pose tracking with event cameras. In this work, we take advantage of the high temporal resolution and propose a method that uses both a propagation step fused with a pose correction strategy. Specifically, we use 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction. Our learning-free method has comparable performance to the state-of-the-art algorithms, and in some cases out performs them for fast-moving objects. The results indicate the potential for using event cameras in highly-dynamic scenarios where the use of deep network approaches are limited by low update rates.

Event-based Motion & Appearance Fusion for 6D Object Pose Tracking

TL;DR

This work uses 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction.

Abstract

Object pose tracking is a fundamental and essential task for robotics to perform tasks in the home and industrial settings. The most commonly used sensors to do so are RGB-D cameras, which can hit limitations in highly dynamic environments due to motion blur and frame-rate constraints. Event cameras have remarkable features such as high temporal resolution and low latency, which make them a potentially ideal vision sensors for object pose tracking at high speed. Even so, there are still only few works on 6D pose tracking with event cameras. In this work, we take advantage of the high temporal resolution and propose a method that uses both a propagation step fused with a pose correction strategy. Specifically, we use 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction. Our learning-free method has comparable performance to the state-of-the-art algorithms, and in some cases out performs them for fast-moving objects. The results indicate the potential for using event cameras in highly-dynamic scenarios where the use of deep network approaches are limited by low update rates.
Paper Structure (26 sections, 13 equations, 6 figures, 2 tables)

This paper contains 26 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of the proposed pipeline for 6D object pose tracking with an event camera. Events captured by an event camera are utilized to update a velocity independent representation (EROS) 10611511, simultaneously with event-based optical flow extraction. The 6D object pose is propagated based on the 6D object velocity estimated from the optical flow measurement. Several appearance templates are generated based on the propagated pose but with additional small pose perturbations, from which edges are extracted. The propagated 6D pose is corrected based on the best matched template compared to the EROS representation. Finally, an Unscented Kalman filter is adopted to further smooth the corrected pose over time.
  • Figure 2: Visualization of (a) raw events, (b) EROS representation, and (c) template generated using the model rendered at a perpetuated pose with edge gradient extraction.
  • Figure 3: Snapshots of objects in synthetic data sequences: (a) regular motion sample, and (b) fast motion sample.
  • Figure 4: Dual-camera setup. The left device is an event camera with resolution 1280$\times$720. The right RGB-D camera is a RealSense D415, only used for evaluating base-line algorithms and doesn't form an input to our algorithm.
  • Figure 5: Propagation and local correction strategy. The (left) figure shows error accumulation if velocity integration is used alone. The (middle) figure demonstrates a failure case of perturbation correction, when the noise or discretisation error allow the true pose to leave the perturbation region without correction. The (right) figure shows the proposed method combining both approaches, enabling the pose variation between the propagated pose and the ground truth pose to more likely remain within the bounds of the local correction space, and conversely correct integration error.
  • ...and 1 more figures