Table of Contents
Fetching ...

Labits: Layered Bidirectional Time Surfaces Representation for Event Camera-based Continuous Dense Trajectory Estimation

Zhongyang Zhang, Jiacheng Qiu, Shuyang Cui, Yijun Luo, Tauhidur Rahman

TL;DR

Labits addresses the challenge of preserving temporal granularity, stable 2D features, and consistent information density in event-camera representations for dense trajectory estimation by introducing Layered Bidirectional Time Surfaces. The Labits representation, coupled with an APLOF extractor and a RAFT-inspired trajectory framework based on Bézier curves, yields large gains on the MultiFlow dataset, achieving a 49% reduction in trajectory end-point error over the previous state-of-the-art. The approach demonstrates that event representations significantly influence downstream performance and offers a scalable, flexible pipeline for high-temporal-resolution motion estimation, with strong results and clear ablations. The work opens avenues for extending Labits to other event-based vision tasks and for combining Labits with complementary representations to balance temporal precision and event density.

Abstract

Event cameras provide a compelling alternative to traditional frame-based sensors, capturing dynamic scenes with high temporal resolution and low latency. Moving objects trigger events with precise timestamps along their trajectory, enabling smooth continuous-time estimation. However, few works have attempted to optimize the information loss during event representation construction, imposing a ceiling on this task. Fully exploiting event cameras requires representations that simultaneously preserve fine-grained temporal information, stable and characteristic 2D visual features, and temporally consistent information density, an unmet challenge in existing representations. We introduce Labits: Layered Bidirectional Time Surfaces, a simple yet elegant representation designed to retain all these features. Additionally, we propose a dedicated module for extracting active pixel local optical flow (APLOF), significantly boosting the performance. Our approach achieves an impressive 49% reduction in trajectory end-point error (TEPE) compared to the previous state-of-the-art on the MultiFlow dataset. The code will be released upon acceptance.

Labits: Layered Bidirectional Time Surfaces Representation for Event Camera-based Continuous Dense Trajectory Estimation

TL;DR

Labits addresses the challenge of preserving temporal granularity, stable 2D features, and consistent information density in event-camera representations for dense trajectory estimation by introducing Layered Bidirectional Time Surfaces. The Labits representation, coupled with an APLOF extractor and a RAFT-inspired trajectory framework based on Bézier curves, yields large gains on the MultiFlow dataset, achieving a 49% reduction in trajectory end-point error over the previous state-of-the-art. The approach demonstrates that event representations significantly influence downstream performance and offers a scalable, flexible pipeline for high-temporal-resolution motion estimation, with strong results and clear ablations. The work opens avenues for extending Labits to other event-based vision tasks and for combining Labits with complementary representations to balance temporal precision and event density.

Abstract

Event cameras provide a compelling alternative to traditional frame-based sensors, capturing dynamic scenes with high temporal resolution and low latency. Moving objects trigger events with precise timestamps along their trajectory, enabling smooth continuous-time estimation. However, few works have attempted to optimize the information loss during event representation construction, imposing a ceiling on this task. Fully exploiting event cameras requires representations that simultaneously preserve fine-grained temporal information, stable and characteristic 2D visual features, and temporally consistent information density, an unmet challenge in existing representations. We introduce Labits: Layered Bidirectional Time Surfaces, a simple yet elegant representation designed to retain all these features. Additionally, we propose a dedicated module for extracting active pixel local optical flow (APLOF), significantly boosting the performance. Our approach achieves an impressive 49% reduction in trajectory end-point error (TEPE) compared to the previous state-of-the-art on the MultiFlow dataset. The code will be released upon acceptance.

Paper Structure

This paper contains 17 sections, 9 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a) Labits generation schematic: For a 1D event camera, at each pixel and probe time $pt_i$, the algorithm searches for the most recent past event within $\delta t$. If none is found, it searches for the next future event within $\delta t$. The Labits value is the normalized time difference between probe time $pt_i$ and the found event's timestamp, or -1 if no event is found. Labits can be converted to APLOF via a small model. (b)-(g): Single channel visualization of the following event representations (more details can be found in related works): (b) Labits (c) Voxel Grid (d) TORE Volume (e) Time Surface (f) Event Count (g) Event Frame (h) RGB frame of the moving target (i) Labits layers samples. Note that the first layer of TORE is exactly the same as time surface.
  • Figure 2: (a) Labits-RAFT architecture: Labits are used to generate correlation blocks, content features, APLOF features for intermediate movement integration, and guide Active Pixel Mask (APM) generation. APM layers are point-wise multiplied to their corresponding APLOF feature layers. Features are used to calculate correlation matrix, eventually generate and refine a Bézier curve $\mathbf{B}$ for each pixel at $\tau_{\text{start}}$ via a ConvGRU. For brevity, we only show the pure event-based pipeline. (b) Labits-to-APLOF Net: HR and LR APLOF are generated based on a customized U-Net and APM.
  • Figure 3: Visualization of detailed inputs and outcomes from our model. It predicts instantaneous APLOF at intermediate reference times, end-point optical flow (the end-point optical flow is computed as the displacement of each pixel along its entire trajectory), and pixel-level Bézier trajectories, all closely aligning with the corresponding ground truth data. *OF: Optical Flow.
  • Figure 4: Trajectory predictions on the MultiFlow dataset by our proposed model and baseline methods. Ground truth Bézier trajectories are shown in red, while predictions are depicted in blue. The background displays the ground truth optical flow to highlight moving objects. Our model's predicted trajectories significantly outperform those of all baseline methods.
  • Figure 5: Comparison of trajectory predictions: between baseline methods and our approach.
  • ...and 7 more figures