Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs
Jonas Kühne, Michele Magno, Luca Benini
TL;DR
The paper addresses the challenge of achieving low-latency, energy-efficient visual inertial odometry on resource-constrained UAVs by moving the optical flow computation onto an on-sensor ASIC (VD56G3). It introduces OF VINS-Mono, a hardware-software co-design that replaces the host feature tracker with on-sensor optical flow data while keeping the VINS-Mono estimator on a Raspberry Pi Compute Module 4. The authors demonstrate substantial latency (about 49.4%) and compute-load reductions (about 53.7%), enabling higher effective frame rates (up to 50 FPS) with competitive tracking accuracy, and they provide a new dataset with ground-truth poses. The practical impact lies in enabling robust, real-time VIO on small, power-limited UAVs and potentially extending to AR/VR and nano-drones through further hardware scaling.
Abstract
Visual Inertial Odometry (VIO) is the task of estimating the movement trajectory of an agent from an onboard camera stream fused with additional Inertial Measurement Unit (IMU) measurements. A crucial subtask within VIO is the tracking of features, which can be achieved through Optical Flow (OF). As the calculation of OF is a resource-demanding task in terms of computational load and memory footprint, which needs to be executed at low latency, especially in robotic applications, OF estimation is today performed on powerful CPUs or GPUs. This restricts its use in a broad spectrum of applications where the deployment of such powerful, power-hungry processors is unfeasible due to constraints related to cost, size, and power consumption. On-sensor hardware acceleration is a promising approach to enable low latency VIO even on resource-constrained devices such as nano drones. This paper assesses the speed-up in a VIO sensor system exploiting a compact OF sensor consisting of a global shutter camera and an Application Specific Integrated Circuit (ASIC). By replacing the feature tracking logic of the VINS-Mono pipeline with data from this OF camera, we demonstrate a 49.4% reduction in latency and a 53.7% reduction of compute load of the VIO pipeline over the original VINS-Mono implementation, allowing VINS-Mono operation up to 50 FPS instead of 20 FPS on the quad-core ARM Cortex-A72 processor of a Raspberry Pi Compute Module 4.
