Table of Contents
Fetching ...

BIT-VO: Visual Odometry at 300 FPS using Binary Features from the Focal Plane

Riku Murai, Sajad Saeedi, Paul H. J. Kelly

TL;DR

BIT-VO tackles high-speed monocular visual odometry by moving feature extraction to a focal-plane sensor-processor (FPSP) and transmitting only binary features, enabling 6-DoF pose estimation at 300 FPS without intensity data. It introduces a compact 44-bit local binary descriptor for edges, and a robust, on-host frame and map tracking pipeline using Levenberg–Marquardt optimisation with a small, fast BA step. The approach is validated on a 256×256 SCAMP-5 platform, showing robustness to rapid motion and competitive accuracy against conventional VO systems, while delivering substantial speed advantages. This work demonstrates the practicality of on-sensor computation for VO, informs FPSP device design, and suggests directions for noise modeling and benchmarking in FPSP-based SLAM-like systems.

Abstract

Focal-plane Sensor-processor (FPSP) is a next-generation camera technology which enables every pixel on the sensor chip to perform computation in parallel, on the focal plane where the light intensity is captured. SCAMP-5 is a general-purpose FPSP used in this work and it carries out computations in the analog domain before analog to digital conversion. By extracting features from the image on the focal plane, data which is digitized and transferred is reduced. As a consequence, SCAMP-5 offers a high frame rate while maintaining low energy consumption. Here, we present BIT-VO, which is, to the best of our knowledge, the first 6 Degrees of Freedom visual odometry algorithm which utilises the FPSP. Our entire system operates at 300 FPS in a natural scene, using binary edges and corner features detected by the SCAMP-5.

BIT-VO: Visual Odometry at 300 FPS using Binary Features from the Focal Plane

TL;DR

BIT-VO tackles high-speed monocular visual odometry by moving feature extraction to a focal-plane sensor-processor (FPSP) and transmitting only binary features, enabling 6-DoF pose estimation at 300 FPS without intensity data. It introduces a compact 44-bit local binary descriptor for edges, and a robust, on-host frame and map tracking pipeline using Levenberg–Marquardt optimisation with a small, fast BA step. The approach is validated on a 256×256 SCAMP-5 platform, showing robustness to rapid motion and competitive accuracy against conventional VO systems, while delivering substantial speed advantages. This work demonstrates the practicality of on-sensor computation for VO, informs FPSP device design, and suggests directions for noise modeling and benchmarking in FPSP-based SLAM-like systems.

Abstract

Focal-plane Sensor-processor (FPSP) is a next-generation camera technology which enables every pixel on the sensor chip to perform computation in parallel, on the focal plane where the light intensity is captured. SCAMP-5 is a general-purpose FPSP used in this work and it carries out computations in the analog domain before analog to digital conversion. By extracting features from the image on the focal plane, data which is digitized and transferred is reduced. As a consequence, SCAMP-5 offers a high frame rate while maintaining low energy consumption. Here, we present BIT-VO, which is, to the best of our knowledge, the first 6 Degrees of Freedom visual odometry algorithm which utilises the FPSP. Our entire system operates at 300 FPS in a natural scene, using binary edges and corner features detected by the SCAMP-5.

Paper Structure

This paper contains 23 sections, 3 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Comparison of the data used by our proposed VO vs conventional VOs. Our system does not use intensity images (top row) but uses the binary edges and corners (bottom row) extracted by SCAMP-5 at 300 FPS. Notice that the edges, when extracted at a high frame-rate, are tolerant against motion blur, and are sharp even when the device is subject to violent motions. For the conventional camera (operating at 20 FPS) , such motion severely blurs the images.
  • Figure 2: Tracking and Mapping pipeline. The pipeline runs on an FPSP and a host device, minimising data flow from the sensor to host device,
  • Figure 3: Illustration of the effect of noisy analog computation. Between two consecutive frames, many corners appear and disappear. The device was mounted on a tripod to ensure stability of the device across multiple frames.
  • Figure 4: Descriptor sampling pattern. Different colours denote a different ring, and indices correspond to the bit index.
  • Figure 5: Estimated x, y, z translations for "Long" sequence. Solid lines show our estimate and dotted lines are the ground truth.
  • ...and 5 more figures