Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone
Pietro Bonazzi, Christian Vogt, Michael Jost, Lyes Khacef, Federico Paredes-Vallés, Michele Magno
TL;DR
This work compares event-based vision (EVS) with conventional RGB systems for real-time collision avoidance on an FPGA-equipped UAV platform. It introduces an end-to-end EVS/deep-learning pipeline that aggregates EVS events into $80 imes 80 imes 1$ frames within a $T=20~ ext{ms}$ window and predicts the collision location $(x, y)$ and time-to-collision $t$, achieving end-to-end latency of approximately $2.14~ ext{ms}$. EVS demonstrates superior temporal and spatial accuracy, including a $ ext{precision improvement of }59 ext{ percentage points}$ and a higher F1 score of $0.73$ versus RGB's $0.06$ on move/stay detection, and maintains robustness across out-of-distribution data. The approach leverages a lightweight encoder on FPGA with a DPU, enabling high-throughput, low-latency inference for real-time UAV navigation and safe decision-making in resource-constrained environments.
Abstract
This work quantitatively evaluates the performance of event-based vision systems (EVS) against conventional RGB-based models for action prediction in collision avoidance on an FPGA accelerator. Our experiments demonstrate that the EVS model achieves a significantly higher effective frame rate (1 kHz) and lower temporal (-20 ms) and spatial prediction errors (-20 mm) compared to the RGB-based model, particularly when tested on out-of-distribution data. The EVS model also exhibits superior robustness in selecting optimal evasion maneuvers. In particular, in distinguishing between movement and stationary states, it achieves a 59 percentage point advantage in precision (78% vs. 19%) and a substantially higher F1 score (0.73 vs. 0.06), highlighting the susceptibility of the RGB model to overfitting. Further analysis in different combinations of spatial classes confirms the consistent performance of the EVS model in both test data sets. Finally, we evaluated the system end-to-end and achieved a latency of approximately 2.14 ms, with event aggregation (1 ms) and inference on the processing unit (0.94 ms) accounting for the largest components. These results underscore the advantages of event-based vision for real-time collision avoidance and demonstrate its potential for deployment in resource-constrained environments.
