Table of Contents
Fetching ...

TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry

Matthew Lisondra, Junseo Kim, Glenn Takashi Shimoda, Kourosh Zareinia, Sajad Saeedi

TL;DR

TCB-VIO delivers a tightly-coupled $6$-DoF VIO designed for focal-plane sensor-processor arrays, achieving $250$ FPS visual updates from an IMU stream at $400$ Hz. By performing on-sensor binary edge/corner extraction and a binary-enhanced KLT tracker, then fusing with a MSCKF backbone, it maintains robust trajectory estimates under fast, aggressive motions. The approach outperforms ROVIO, VINS-Mono, and ORB-SLAM3 in indoor and outdoor tests, while offering substantial energy and latency advantages due to on-sensor processing. This work demonstrates FPSPs' potential to enable low-latency, power-efficient VIO for mobile robotics and motivates further hardware-software co-design to migrate more of the pipeline onto the sensor fabric.

Abstract

Vision algorithms can be executed directly on the image sensor when implemented on the next-generation sensors known as focal-plane sensor-processor arrays (FPSP)s, where every pixel has a processor. FPSPs greatly improve latency, reducing the problems associated with the bottleneck of data transfer from a vision sensor to a processor. FPSPs accelerate vision-based algorithms such as visual-inertial odometry (VIO). However, VIO frameworks suffer from spatial drift due to the vision-based pose estimation, whilst temporal drift arises from the inertial measurements. FPSPs circumvent the spatial drift by operating at a high frame rate to match the high-frequency output of the inertial measurements. In this paper, we present TCB-VIO, a tightly-coupled 6 degrees-of-freedom VIO by a Multi-State Constraint Kalman Filter (MSCKF), operating at a high frame-rate of 250 FPS and from IMU measurements obtained at 400 Hz. TCB-VIO outperforms state-of-the-art methods: ROVIO, VINS-Mono, and ORB-SLAM3.

TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry

TL;DR

TCB-VIO delivers a tightly-coupled -DoF VIO designed for focal-plane sensor-processor arrays, achieving FPS visual updates from an IMU stream at Hz. By performing on-sensor binary edge/corner extraction and a binary-enhanced KLT tracker, then fusing with a MSCKF backbone, it maintains robust trajectory estimates under fast, aggressive motions. The approach outperforms ROVIO, VINS-Mono, and ORB-SLAM3 in indoor and outdoor tests, while offering substantial energy and latency advantages due to on-sensor processing. This work demonstrates FPSPs' potential to enable low-latency, power-efficient VIO for mobile robotics and motivates further hardware-software co-design to migrate more of the pipeline onto the sensor fabric.

Abstract

Vision algorithms can be executed directly on the image sensor when implemented on the next-generation sensors known as focal-plane sensor-processor arrays (FPSP)s, where every pixel has a processor. FPSPs greatly improve latency, reducing the problems associated with the bottleneck of data transfer from a vision sensor to a processor. FPSPs accelerate vision-based algorithms such as visual-inertial odometry (VIO). However, VIO frameworks suffer from spatial drift due to the vision-based pose estimation, whilst temporal drift arises from the inertial measurements. FPSPs circumvent the spatial drift by operating at a high frame rate to match the high-frequency output of the inertial measurements. In this paper, we present TCB-VIO, a tightly-coupled 6 degrees-of-freedom VIO by a Multi-State Constraint Kalman Filter (MSCKF), operating at a high frame-rate of 250 FPS and from IMU measurements obtained at 400 Hz. TCB-VIO outperforms state-of-the-art methods: ROVIO, VINS-Mono, and ORB-SLAM3.

Paper Structure

This paper contains 24 sections, 14 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: In a focal-plane sensor-processor array, each pixel, i.e., processing element, combines a photosensor circuit (PIX) with on-pixel compute resources, including an arithmetic-logic unit (ALU), input/output (I/O) circuits, local communication links (NEWS), local memory (Registers), and activity control (FLAG). This architecture enables image processing to be performed directly on the sensor Dudek_ScienceRob_2022.
  • Figure 2: TCB-VIO processing pipeline. Numbers inside each block correspond to the paper sections describing them. Green blocks denote visual processing: the FPSP performs on-sensor corner and edge extraction once an image is formed, and the extracted features are transferred to the host where a novel binary-enhanced KLT tracker ensures fast and robust tracking. Blue blocks denote inertial propagation and updates via MSCKF.
  • Figure 3: Overview of the binary-enhanced KLT Tracking. The FPSP generates binary corners and edges. On the host, binary edges are feathered, and the KLT tracker operates in spatial windows centered on each corner feature.
  • Figure 4: Overview of representative testing trajectories used in our evaluation, aligned with the performance metrics reported in Table \ref{['tb:ate_rte']}. All trajectories were executed under fast and hostile motions, as reflected by the high angular velocities (10--33 rad/s) reported in Table \ref{['tb:ate_rte']}. Ground truth is shown in gray, with error mapping done for TCB-VIO.
  • Figure 7: (Top row) Trajectory #8 (F-type failure) showing failure during pink regions. TCB-VIO tracks consistently despite occasional deviations from ground-truth. (Bottom row) Trajectory #3 (PF-type failure) illustrating ORB-SLAM3 reinitializing but quickly diverging, highlighted in pink, leading to ATE and RTE errors.
  • ...and 1 more figures