Table of Contents
Fetching ...

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems

Andrew C. Freeman, Ketan Mayer-Patel, Montek Singh

TL;DR

The paper tackles the challenge of high data rates in long-duration surveillance by translating framed video into sparse, asynchronous intensity samples via an enhanced ADΔER framework. It introduces a practical suite of codec improvements (including absolute timing, a redefined Δt_max, adaptive thresholds, CRF, and multifaceted D control), a lossy compression scheme (ADUs, event cubes, CABAC), and an asynchronous FAST feature detector that together enable significant speedups and compression on VIRAT data. Key findings show up to 2.5:1 compression with minor PSNR loss, and a median FAST-speedup of 43.7% over frame-based OpenCV, with performance varying by motion complexity; feature-driven rate control further improves downstream fidelity. The work demonstrates that asynchronous, content-aware representations can outperform traditional frame-based pipelines for surveillance analytics and lays groundwork for integration with neuromorphic sensors and spiking neural networks.

Abstract

The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems

TL;DR

The paper tackles the challenge of high data rates in long-duration surveillance by translating framed video into sparse, asynchronous intensity samples via an enhanced ADΔER framework. It introduces a practical suite of codec improvements (including absolute timing, a redefined Δt_max, adaptive thresholds, CRF, and multifaceted D control), a lossy compression scheme (ADUs, event cubes, CABAC), and an asynchronous FAST feature detector that together enable significant speedups and compression on VIRAT data. Key findings show up to 2.5:1 compression with minor PSNR loss, and a median FAST-speedup of 43.7% over frame-based OpenCV, with performance varying by motion complexity; feature-driven rate control further improves downstream fidelity. The work demonstrates that asynchronous, content-aware representations can outperform traditional frame-based pipelines for surveillance analytics and lays groundwork for integration with neuromorphic sensors and spiking neural networks.

Abstract

The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
Paper Structure (24 sections, 5 equations, 6 figures, 3 tables)

This paper contains 24 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Simplified flowchart of our lossy compression scheme
  • Figure 2: Comparison between FAST feature detection in a classical video system (top) and our AD$\Delta$ER-based system (bottom). With AD$\Delta$ER, the decompressed representation is itself sparse, meaning that the application has much less data to process.
  • Figure 3: Key metrics gathered for a particular video. The Lossless lines are achievable in the work of freeman_mmsys23 with $M = 0$, while the other lines result from our new CRF mechanism (\ref{['sec:crf']}). (a) The bitrates of the raw AD$\Delta$ER representations (before arithmetic coding) at our four quality levels. For comparison, the bitrate of a raw decoded image frame is constant. (b) The mean squared error of framed reconstructions of the raw AD$\Delta$ER events. (c) The execution time for FAST feature detection. (d) The total number of detected features present, over time.
  • Figure 4: Zoomed-in view of the effect of feature-driven rate adaptation during transcode. (a) shows the input to the transcoder, which was compressed with H.265 at CRF level 23. (b)-(d) show views of the events transcoded under the Low quality setting. (e)-(g) show views of the same transcode setting and feature-driven rate adaptation enabled, as described in \ref{['sec:feature_rate_control']}. The $D$ and $\Delta t$ images are normalized, such that darker pixels correspond to smaller $D$ and $\Delta t$, respectively.
  • Figure 5: Representative rate-distortion curves showing the effect of our transcoder quality settings and feature-driven rate adaptation. At the Low quality setting, our compressed representation approaches or surpasses the bitrate of the H.265-encoded source video, while maintaining a high PSNR value. Note that the H.265 data point for each video expresses only the bitrate, since its PSNR (with reference to itself) is undefined. We show one video per plot for readability, since the other videos in each motion category follow similar patterns.
  • ...and 1 more figures