Table of Contents
Fetching ...

Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment

Jiaqi Wu, Shihao Zhang, Simin Chen, Lixu Wang, Zehua Wang, Wei Chen, Fangyuan He, Zijian Tian, F. Richard Yu, Victor C. M. Leung

TL;DR

This work targets efficient edge deployment of deep learning-based object detection by introducing ED-TOOLBOX, a plug-and-play toolkit that preserves accuracy while reducing model size and latency. It couples a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) with a Sparse Cross-Attention (SC-A) Joint Module and an Efficient Head to enable real-time edge detection across YOLO/SSD architectures. A new Helmet Band Detection Dataset (HBDD) demonstrates real-world safety-critical detection needs, and extensive experiments show ED-TOOLBOX-enhanced models outperform six SOTA methods in visual surveillance while maintaining edge-friendly resource profiles. Limitations include limited compatibility with Transformer-based detectors, with future work aimed at extending ED-TOOLBOX to transformers and broader tasks, expanding its applicability in edge AI ecosystems.

Abstract

Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios. However, existing edge detection methods face challenges: 1) difficulty balancing detection precision with lightweight models, 2) limited adaptability of generalized deployment designs, and 3) insufficient real-world validation. To address these issues, we propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments. Specifically, we introduce a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) featuring weighted multi-shape convolutional branches to enhance detection performance. Additionally, we design a Sparse Cross-Attention (SC-A) network with a localized-mapping-assisted self-attention mechanism, enabling a well-crafted joint module for adaptive feature transfer. For real-world applications, we incorporate an Efficient Head into the YOLO framework to accelerate edge model optimization. To demonstrate practical impact, we identify a gap in helmet detection -- overlooking band fastening, a critical safety factor -- and create the Helmet Band Detection Dataset (HBDD). Using ED-TOOLBOX-optimized models, we address this real-world task. Extensive experiments validate the effectiveness of ED-TOOLBOX, with edge detection models outperforming six state-of-the-art methods in visual surveillance simulations, achieving real-time and accurate performance. These results highlight ED-TOOLBOX as a superior solution for edge object detection.

Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment

TL;DR

This work targets efficient edge deployment of deep learning-based object detection by introducing ED-TOOLBOX, a plug-and-play toolkit that preserves accuracy while reducing model size and latency. It couples a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) with a Sparse Cross-Attention (SC-A) Joint Module and an Efficient Head to enable real-time edge detection across YOLO/SSD architectures. A new Helmet Band Detection Dataset (HBDD) demonstrates real-world safety-critical detection needs, and extensive experiments show ED-TOOLBOX-enhanced models outperform six SOTA methods in visual surveillance while maintaining edge-friendly resource profiles. Limitations include limited compatibility with Transformer-based detectors, with future work aimed at extending ED-TOOLBOX to transformers and broader tasks, expanding its applicability in edge AI ecosystems.

Abstract

Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios. However, existing edge detection methods face challenges: 1) difficulty balancing detection precision with lightweight models, 2) limited adaptability of generalized deployment designs, and 3) insufficient real-world validation. To address these issues, we propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments. Specifically, we introduce a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) featuring weighted multi-shape convolutional branches to enhance detection performance. Additionally, we design a Sparse Cross-Attention (SC-A) network with a localized-mapping-assisted self-attention mechanism, enabling a well-crafted joint module for adaptive feature transfer. For real-world applications, we incorporate an Efficient Head into the YOLO framework to accelerate edge model optimization. To demonstrate practical impact, we identify a gap in helmet detection -- overlooking band fastening, a critical safety factor -- and create the Helmet Band Detection Dataset (HBDD). Using ED-TOOLBOX-optimized models, we address this real-world task. Extensive experiments validate the effectiveness of ED-TOOLBOX, with edge detection models outperforming six state-of-the-art methods in visual surveillance simulations, achieving real-time and accurate performance. These results highlight ED-TOOLBOX as a superior solution for edge object detection.

Paper Structure

This paper contains 17 sections, 2 theorems, 25 equations, 13 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

Under assu1:conditionassu2:conditionassu3:condition, the computational complexity of Rep-DConvNet is lower than that of the original RepVGG and regular convolutional layers.

Figures (13)

  • Figure 1: The video surveillance system is based on an Internet of Video Things (IoVT) environment. The black dashed line distinguishes between Centralized Cloud Computing and Edge Computing. In edge computing, data captured by CCTV Cameras does not need to be uploaded to a Cloud Control Center; instead, computation is performed directly on Edge Computing Devices deployed in the Working Scenes.
  • Figure 2: Different helmet-wearing behaviours. Red boxes indicate no helmet worn, green boxes indicate a helmet worn correctly, and yellow boxes indicate a helmet worn but without the helmet band tied. Not tying the hatband is a common improper helmet-wearing behaviour.
  • Figure 3: Overview of the ED-TOOLBOX. It includes the Reparameterized Dynamic Convolutional Network (Rep-DConvNet), the Joint module, and the Efficient Head. These plug-and-play components achieve edge deployment of detection models and maintain excellent performance through "replacement" and "insertion."
  • Figure 4: The structure of Rep-DConvNet. It decouples model Training phase and Inference phase, and is divided into a Basic network and a Downsampling network according to the design requirements of the detection framework ali2024yolo.
  • Figure 5: Procedure of Sparse Cross-Attention (SC-A). At the Pre-processing Stage, it employs multiple pooling operations to perform local embedding mappings, reducing the complexity of subsequent computations. At the Attention Stage, SC-A then performs self-attention computation on the two pooling results. Finally, at Post-processing Stage, SC-A performs expansion operations to obtain the final attention result.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2