Table of Contents
Fetching ...

AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics

Xiangxiang Dai, Zeyu Zhang, Peng Yang, Yuedong Xu, Xutong Liu, John C. S. Lui

TL;DR

AxiomVision tackles the problem of achieving accurate video analytics under diverse environments and camera viewpoints by enabling dynamic, online selection among a tiered edge-cloud set of visual models. It introduces a continual online learning framework that incorporates camera perspective effects through perspective-weight vectors, and leverages a graph-based grouping of cameras to accelerate model selection. The approach comes with theoretical guarantees on regret and achieves substantial empirical gains, including a reported 25.7% accuracy improvement, while reducing bandwidth and latency via combinatorial model sets and topology-aware grouping. The work demonstrates practical scalability for large camera networks and provides open-source code to enable deployment and further research in perspective-aware adaptive video analytics.

Abstract

The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.

AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics

TL;DR

AxiomVision tackles the problem of achieving accurate video analytics under diverse environments and camera viewpoints by enabling dynamic, online selection among a tiered edge-cloud set of visual models. It introduces a continual online learning framework that incorporates camera perspective effects through perspective-weight vectors, and leverages a graph-based grouping of cameras to accelerate model selection. The approach comes with theoretical guarantees on regret and achieves substantial empirical gains, including a reported 25.7% accuracy improvement, while reducing bandwidth and latency via combinatorial model sets and topology-aware grouping. The work demonstrates practical scalability for large camera networks and provides open-source code to enable deployment and further research in perspective-aware adaptive video analytics.

Abstract

The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.
Paper Structure (22 sections, 9 equations, 21 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 9 equations, 21 figures, 4 tables, 1 algorithm.

Figures (21)

  • Figure 1: Comparative analysis of visual model performance across different environmental conditions.
  • Figure 2: Role of camera perspective in object detection and semantic segmentation visual tasks.
  • Figure 3: Semantic segmentation across diverse camera perspectives for the same dancer.
  • Figure 4: Overview of AxiomVision framework.
  • Figure 5: Comparison when perspective is not considered.
  • ...and 16 more figures