Table of Contents
Fetching ...

Low-Latency Scalable Streaming for Event-Based Vision

Andrew Hamara, Benjamin Kilpatrick, Alex Baratta, Brendon Kofink, Andrew C. Freeman

TL;DR

This work first demonstrates that a state-of-the-art object detection application is resilient to dramatic data loss, and that this loss may be weighted towards the end of each temporal window, and proposes a scalable streaming method for event-based data based on Media Over QUIC, prioritizing object detection performance and low latency.

Abstract

Recently, we have witnessed the rise of novel ``event-based'' camera sensors for high-speed, low-power video capture. Rather than recording discrete image frames, these sensors output asynchronous ``event'' tuples with microsecond precision, only when the brightness change of a given pixel exceeds a certain threshold. Although these sensors have enabled compelling new computer vision applications, these applications often require expensive, power-hungry GPU systems, rendering them incompatible for deployment on the low-power devices for which event cameras are optimized. Whereas receiver-driven rate adaptation is a crucial feature of modern video streaming solutions, this topic is underexplored in the realm of event-based vision systems. On a real-world event camera dataset, we first demonstrate that a state-of-the-art object detection application is resilient to dramatic data loss, and that this loss may be weighted towards the end of each temporal window. We then propose a scalable streaming method for event-based data based on Media Over QUIC, prioritizing object detection performance and low latency. The application server can receive complementary event data across several streams simultaneously, and drop streams as needed to maintain a certain latency. With a latency target of 5 ms for end-to-end transmission across a small network, we observe an average reduction in detection mAP as low as 0.36. With a more relaxed latency target of 50 ms, we observe an average mAP reduction as low as 0.19.

Low-Latency Scalable Streaming for Event-Based Vision

TL;DR

This work first demonstrates that a state-of-the-art object detection application is resilient to dramatic data loss, and that this loss may be weighted towards the end of each temporal window, and proposes a scalable streaming method for event-based data based on Media Over QUIC, prioritizing object detection performance and low latency.

Abstract

Recently, we have witnessed the rise of novel ``event-based'' camera sensors for high-speed, low-power video capture. Rather than recording discrete image frames, these sensors output asynchronous ``event'' tuples with microsecond precision, only when the brightness change of a given pixel exceeds a certain threshold. Although these sensors have enabled compelling new computer vision applications, these applications often require expensive, power-hungry GPU systems, rendering them incompatible for deployment on the low-power devices for which event cameras are optimized. Whereas receiver-driven rate adaptation is a crucial feature of modern video streaming solutions, this topic is underexplored in the realm of event-based vision systems. On a real-world event camera dataset, we first demonstrate that a state-of-the-art object detection application is resilient to dramatic data loss, and that this loss may be weighted towards the end of each temporal window. We then propose a scalable streaming method for event-based data based on Media Over QUIC, prioritizing object detection performance and low latency. The application server can receive complementary event data across several streams simultaneously, and drop streams as needed to maintain a certain latency. With a latency target of 5 ms for end-to-end transmission across a small network, we observe an average reduction in detection mAP as low as 0.36. With a more relaxed latency target of 50 ms, we observe an average mAP reduction as low as 0.19.

Paper Structure

This paper contains 16 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Examples of event reduction at various bandwidth limits, with object detections overlaid. At lower bandwidths, there are fewer events in the raw representation, lowering the application accuracy.
  • Figure 2: Example of a "smart city" video system made possible by scalable streaming with MoQ. Heavy vision application computation can be offloaded from heterogeneous edge sensors, prioritizing low latency and receiving a subset of the published data. Meanwhile, a video archival server can receive all the data published by the cameras, since it is more tolerant to spikes in latency.
  • Figure 3: The change in object detection accuracy at various bandwidth levels for each video in our test dataset, without latency-driven rate adaptation (\ref{['sec:fixed_bitrate']}). Object detection performance remains high even when the majority of the events are removed.
  • Figure 4: Example of event partitioning and reconstruction with the two strategies described in \ref{['sec:event_division']}. The receiver is subscribed to only 2 tracks, so the application receives a subset of the source data for inference. The (a) strategy provides a more even distribution of events over the time window, at the cost of enormous latency to interleave the streams for reconstruction. Meanwhile, the (b) strategy has little overhead for reconstruction and the application performance is only negligibly worse.
  • Figure 5: Experimental results from sample video test_day_013 with $N=5$ tracks and $B=100$ Mbps, showing various metrics across the timespan of the video. These results demonstrate a case where the event loss induced by our streaming system actually increases the mAP for many time segments. The increase in the source data rate towards the end of the video corresponds to an increase in latency for the relaxed configuration and a dropoff in mAP for both the strict and relaxed configurations.
  • ...and 2 more figures