Table of Contents
Fetching ...

Compression Metadata-assisted RoI Extraction and Adaptive Inference for Efficient Video Analytics

Chengzhi Wang, Peng Yang

TL;DR

The paper tackles the high computational burden of edge video analytics by exploiting encoding metadata to guide RoI extraction and by dispatching RoIs to appropriately scaled neural networks. It introduces a motion-vector based RoI pipeline with a five-step process and employs a convex-relaxation based resource allocation plus a CTU bitrate–driven RoI scheduling algorithm to balance accuracy and latency. Empirical results on real-world video datasets show nearly 40% latency reduction and an average 2.2% accuracy improvement over strong baselines, demonstrating practical gains for resource-constrained edges. The work highlights how encoding metadata can yield robust RoI signals and enable end-to-edge collaboration with adaptive, multi-scale analytics for efficient video understanding.

Abstract

Video analytics demand substantial computing resources, posing significant challenges in computing resource-constrained environment. In this paper, to achieve high accuracy with acceptable computational workload, we propose a cost-effective regions of interest (RoIs) extraction and adaptive inference scheme based on the informative encoding metadata. Specifically, to achieve efficient RoI-based analytics, we explore motion vectors from encoding metadata to identify RoIs in non-reference frames through morphological opening operation. Furthermore, considering the content variation of RoIs, which calls for inference by models with distinct size, we measure RoI complexity based on the bitrate allocation information from encoding metadata. Finally, we design an algorithm that prioritizes scheduling RoIs to models of the appropriate complexity, balancing accuracy and latency. Extensive experimental results show that our proposed scheme reduces latency by nearly 40% and improves 2.2% on average in accuracy, outperforming the latest benchmarks.

Compression Metadata-assisted RoI Extraction and Adaptive Inference for Efficient Video Analytics

TL;DR

The paper tackles the high computational burden of edge video analytics by exploiting encoding metadata to guide RoI extraction and by dispatching RoIs to appropriately scaled neural networks. It introduces a motion-vector based RoI pipeline with a five-step process and employs a convex-relaxation based resource allocation plus a CTU bitrate–driven RoI scheduling algorithm to balance accuracy and latency. Empirical results on real-world video datasets show nearly 40% latency reduction and an average 2.2% accuracy improvement over strong baselines, demonstrating practical gains for resource-constrained edges. The work highlights how encoding metadata can yield robust RoI signals and enable end-to-edge collaboration with adaptive, multi-scale analytics for efficient video understanding.

Abstract

Video analytics demand substantial computing resources, posing significant challenges in computing resource-constrained environment. In this paper, to achieve high accuracy with acceptable computational workload, we propose a cost-effective regions of interest (RoIs) extraction and adaptive inference scheme based on the informative encoding metadata. Specifically, to achieve efficient RoI-based analytics, we explore motion vectors from encoding metadata to identify RoIs in non-reference frames through morphological opening operation. Furthermore, considering the content variation of RoIs, which calls for inference by models with distinct size, we measure RoI complexity based on the bitrate allocation information from encoding metadata. Finally, we design an algorithm that prioritizes scheduling RoIs to models of the appropriate complexity, balancing accuracy and latency. Extensive experimental results show that our proposed scheme reduces latency by nearly 40% and improves 2.2% on average in accuracy, outperforming the latest benchmarks.

Paper Structure

This paper contains 21 sections, 7 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: A typical example of the RoI inference: inference performance of (a) the whole frame by YOLOv8s model, (b) the RoI by two models adaptively, (c) the whole frame by YOLOv8x model.
  • Figure 2: Performance variation of different models in various scales with an input size of 224 * 224.
  • Figure 3: Accuracy improvement differences in spatial distribution within frames.
  • Figure 4: Illustration of system model.
  • Figure 5: Illustration of encoding metadata assisted RoI extraction.
  • ...and 6 more figures