AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics
Xiangxiang Dai, Zeyu Zhang, Peng Yang, Yuedong Xu, Xutong Liu, John C. S. Lui
TL;DR
AxiomVision tackles the problem of achieving accurate video analytics under diverse environments and camera viewpoints by enabling dynamic, online selection among a tiered edge-cloud set of visual models. It introduces a continual online learning framework that incorporates camera perspective effects through perspective-weight vectors, and leverages a graph-based grouping of cameras to accelerate model selection. The approach comes with theoretical guarantees on regret and achieves substantial empirical gains, including a reported 25.7% accuracy improvement, while reducing bandwidth and latency via combinatorial model sets and topology-aware grouping. The work demonstrates practical scalability for large camera networks and provides open-source code to enable deployment and further research in perspective-aware adaptive video analytics.
Abstract
The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.
