Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems
Andrew C. Freeman, Ketan Mayer-Patel, Montek Singh
TL;DR
The paper tackles the challenge of high data rates in long-duration surveillance by translating framed video into sparse, asynchronous intensity samples via an enhanced ADΔER framework. It introduces a practical suite of codec improvements (including absolute timing, a redefined Δt_max, adaptive thresholds, CRF, and multifaceted D control), a lossy compression scheme (ADUs, event cubes, CABAC), and an asynchronous FAST feature detector that together enable significant speedups and compression on VIRAT data. Key findings show up to 2.5:1 compression with minor PSNR loss, and a median FAST-speedup of 43.7% over frame-based OpenCV, with performance varying by motion complexity; feature-driven rate control further improves downstream fidelity. The work demonstrates that asynchronous, content-aware representations can outperform traditional frame-based pipelines for surveillance analytics and lays groundwork for integration with neuromorphic sensors and spiking neural networks.
Abstract
The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
