Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection?
Kun Guo, Yun Shen, Xijun Wang, Chaoqun You, Yun Rui, Tony Q. S. Quek
TL;DR
This work addresses efficient video object recognition on resource-constrained devices by jointly leveraging local tracking and edge-based detection. It introduces LTED-Ada, a DRL-based policy that selects between on-device tracking and edge detection using a queue-aware MDP, and extends it to a multi-device setting with federated learning to improve generalization across unseen frame rates and requirements. The authors formulate long-term optimization problems for both single-device ($P_0$) and multi-device ($P_1$) scenarios and demonstrate, via hardware-in-the-loop experiments with Raspberry Pi 4B devices and a PC edge server, that LTED-Ada outperforms several baselines by balancing recognition accuracy, handling delay, and waiting delay. The results show practical viability for edge-assisted video analytics in mobile networks, with the federated approach providing robustness to dynamic workloads and frame-rate variations.
Abstract
Fast and accurate video object recognition, which relies on frame-by-frame video analytics, remains a challenge for resource-constrained devices such as traffic cameras. Recent advances in mobile edge computing have made it possible to offload computation-intensive object detection to edge servers equipped with high-accuracy neural networks, while lightweight and fast object tracking algorithms run locally on devices. This hybrid approach offers a promising solution but introduces a new challenge: deciding when to perform edge detection versus local tracking. To address this, we formulate two long-term optimization problems for both single-device and multi-device scenarios, taking into account the temporal correlation of consecutive frames and the dynamic conditions of mobile edge networks. Based on the formulation, we propose the LTED-Ada in single-device setting, a deep reinforcement learning-based algorithm that adaptively selects between local tracking and edge detection, according to the frame rate as well as recognition accuracy and delay requirement. In multi-device setting, we further enhance LTED-Ada using federated learning to enable collaborative policy training across devices, thereby improving its generalization to unseen frame rates and performance requirements. Finally, we conduct extensive hardware-in-the-loop experiments using multiple Raspberry Pi 4B devices and a personal computer as the edge server, demonstrating the superiority of LTED-Ada.
