MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices
Dongyang Yu, Haoyue Zhang, Ruisheng Zhao, Guoqi Chen, Wangpeng An, Yanhong Yang
TL;DR
MovePose tackles the gap in real-time, accurate human pose estimation on mobile and edge devices by introducing a lightweight CNN that leverages large kernel convolutions, a deconvolution upsampling path, and the SimCC coordinate-classification approach. It couples these with a Lite pre-training strategy and a MobileNet backbone to achieve high accuracy at low computational cost, reporting 68.0 mAP on COCO val and substantial FPS on CPU, GPU, and mobile hardware. The method demonstrates strong performance across COCO test-dev, COCO-SinglePerson, and MPII benchmarks, with notable gains when using flip testing and efficient architectures. Overall, MovePose validates that edge-oriented HPE can attain real-time speeds without sacrificing accuracy, enabling practical deployment in sports analytics, robotics, and AR applications.
Abstract
We present MovePose, an optimized lightweight convolutional neural network designed specifically for real-time body pose estimation on CPU-based mobile devices. The current solutions do not provide satisfactory accuracy and speed for human posture estimation, and MovePose addresses this gap. It aims to maintain real-time performance while improving the accuracy of human posture estimation for mobile devices. Our MovePose algorithm has attained an Mean Average Precision (mAP) score of 68.0 on the COCO \cite{cocodata} validation dataset. The MovePose algorithm displayed efficiency with a performance of 69+ frames per second (fps) when run on an Intel i9-10920x CPU. Additionally, it showcased an increased performance of 452+ fps on an NVIDIA RTX3090 GPU. On an Android phone equipped with a Snapdragon 8 + 4G processor, the fps reached above 11. To enhance accuracy, we incorporated three techniques: deconvolution, large kernel convolution, and coordinate classification methods. Compared to basic upsampling, deconvolution is trainable, improves model capacity, and enhances the receptive field. Large kernel convolution strengthens these properties at a decreased computational cost. In summary, MovePose provides high accuracy and real-time performance, marking it a potential tool for a variety of applications, including those focused on mobile-side human posture estimation. The code and models for this algorithm will be made publicly accessible.
