Compression Metadata-assisted RoI Extraction and Adaptive Inference for Efficient Video Analytics
Chengzhi Wang, Peng Yang
TL;DR
The paper tackles the high computational burden of edge video analytics by exploiting encoding metadata to guide RoI extraction and by dispatching RoIs to appropriately scaled neural networks. It introduces a motion-vector based RoI pipeline with a five-step process and employs a convex-relaxation based resource allocation plus a CTU bitrate–driven RoI scheduling algorithm to balance accuracy and latency. Empirical results on real-world video datasets show nearly 40% latency reduction and an average 2.2% accuracy improvement over strong baselines, demonstrating practical gains for resource-constrained edges. The work highlights how encoding metadata can yield robust RoI signals and enable end-to-edge collaboration with adaptive, multi-scale analytics for efficient video understanding.
Abstract
Video analytics demand substantial computing resources, posing significant challenges in computing resource-constrained environment. In this paper, to achieve high accuracy with acceptable computational workload, we propose a cost-effective regions of interest (RoIs) extraction and adaptive inference scheme based on the informative encoding metadata. Specifically, to achieve efficient RoI-based analytics, we explore motion vectors from encoding metadata to identify RoIs in non-reference frames through morphological opening operation. Furthermore, considering the content variation of RoIs, which calls for inference by models with distinct size, we measure RoI complexity based on the bitrate allocation information from encoding metadata. Finally, we design an algorithm that prioritizes scheduling RoIs to models of the appropriate complexity, balancing accuracy and latency. Extensive experimental results show that our proposed scheme reduces latency by nearly 40% and improves 2.2% on average in accuracy, outperforming the latest benchmarks.
