BiSwift: Bandwidth Orchestrator for Multi-Stream Video Analytics on Edge
Lin Sun, Weijun Wang, Tingting Yuan, Liang Mi, Haipeng Dai, Yunxin Liu, Xiaoming Fu
TL;DR
BiSwift tackles bandwidth-limited edge deployment for real-time multi-stream video analytics by introducing a bi-level framework that couples a novel adaptive hybrid codec with a global bandwidth controller. It uses a DRL-driven frame classifier to select informative anchors and a BLO-based bandwidth manager to allocate resources fairly across streams. The approach enables real-time object detection on up to 9 streams on an RTX 3070, achieving 10–21% accuracy improvements and 1.2–9× throughput gains over prior VAPs, with a 19% accuracy boost over pure codec baselines. The work demonstrates scalable, analytics-aware edge analytics with practical impact on surveillance and traffic-monitoring deployments.
Abstract
High-definition (HD) cameras for surveillance and road traffic have experienced tremendous growth, demanding intensive computation resources for real-time analytics. Recently, offloading frames from the front-end device to the back-end edge server has shown great promise. In multi-stream competitive environments, efficient bandwidth management and proper scheduling are crucial to ensure both high inference accuracy and high throughput. To achieve this goal, we propose BiSwift, a bi-level framework that scales the concurrent real-time video analytics by a novel adaptive hybrid codec integrated with multi-level pipelines, and a global bandwidth controller for multiple video streams. The lower-level front-back-end collaborative mechanism (called adaptive hybrid codec) locally optimizes the accuracy and accelerates end-to-end video analytics for a single stream. The upper-level scheduler aims to accuracy fairness among multiple streams via the global bandwidth controller. The evaluation of BiSwift shows that BiSwift is able to real-time object detection on 9 streams with an edge device only equipped with an NVIDIA RTX3070 (8G) GPU. BiSwift improves 10%$\sim$21% accuracy and presents 1.2$\sim$9$\times$ throughput compared with the state-of-the-art video analytics pipelines.
