Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Xiang Chen; Wenjie Zhu; Jiayuan Chen; Tong Zhang; Changyan Yi; Jun Cai

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai

TL;DR

The paper tackles real-time edge-based video analysis under fluctuating wireless networks, where offloading decisions and per-frame configurations must adapt to changing conditions. It introduces a two-layer reinforcement-learning framework (DCRL) that couples a DDQN-based offloading policy with a CMAB-based adaptive configuration policy to maximize processing rate and accuracy. TAODM/ROIM implement spatial-temporal semantic filtering to reduce data transmission while maintaining detection quality, with a coordinated training procedure across layers. Experiments on multi-camera pedestrian datasets show substantial gains in processing rate, mAP, and latency reduction compared with several baselines, highlighting practical potential for edge-enabled intelligent vision devices.

Abstract

This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model. ROIM determines each offloading frame's resolution and detection model configuration to ensure that the analysis results can return in time. TAODM and ROIM interact jointly to filter the repetitive spatial-temporal semantic information to maximize the processing rate while ensuring high video analysis accuracy. Unlike most existing works, this paper investigates the real-time video analysis systems where the intelligent visual device connects to the edge server through a wireless network with fluctuating network conditions. We decompose the real-time video analysis problem into the offloading decision and configurations selection sub-problems. To solve these two sub-problems, we introduce a double deep Q network (DDQN) based offloading approach and a contextual multi-armed bandit (CMAB) based adaptive configurations selection approach, respectively. A DDQN-CMAB reinforcement learning (DCRL) training framework is further developed to integrate these two approaches to improve the overall video analyzing performance. Extensive simulations are conducted to evaluate the performance of the proposed solution, and demonstrate its superiority over counterparts.

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

TL;DR

Abstract

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Authors

TL;DR

Abstract

Table of Contents

Figures (3)