DIR-BHRNet: A Lightweight Network for Real-time Vision-based Multi-person Pose Estimation on Smartphones
Gongjin Lan, Yu Wu, Qi Hao
TL;DR
DIR-BHRNet tackles real-time multi-person pose estimation on smartphones by integrating Dense Inverted Residual (DIR) blocks into a Balanced HRNet (BHRNet) backbone to reduce compute while maintaining high accuracy. The framework uses a heatmap and associative embedding loss with L = $\alpha L_H + \beta L_T$ and $\alpha=0.99$, $\beta=0.01$ to train the network, and demonstrates competitive performance on COCO and CrowdPose with real-time deployment on Android devices via NCNN. The two main contributions are the DIR module and the Balanced HRNet backbone, which together yield a favorable accuracy-vs-cost trade-off (e.g., DIR-BHRNet-32 achieving ~50.5 mAP at modest GFLOPS) and real-time inference (>10 FPS) on mainstream smartphones. Public release of Android executable and code supports practical adoption of real-time, vision-based MPPE on consumer devices.
Abstract
Human pose estimation (HPE), particularly multi-person pose estimation (MPPE), has been applied in many domains such as human-machine systems. However, the current MPPE methods generally run on powerful GPU systems and take a lot of computational costs. Real-time MPPE on mobile devices with low-performance computing is a challenging task. In this paper, we propose a lightweight neural network, DIR-BHRNet, for real-time MPPE on smartphones. In DIR-BHRNet, we design a novel lightweight convolutional module, Dense Inverted Residual (DIR), to improve accuracy by adding a depthwise convolution and a shortcut connection into the well-known Inverted Residual, and a novel efficient neural network structure, Balanced HRNet (BHRNet), to reduce computational costs by reconfiguring the proper number of convolutional blocks on each branch. We evaluate DIR-BHRNet on the well-known COCO and CrowdPose datasets. The results show that DIR-BHRNet outperforms the state-of-the-art methods in terms of accuracy with a real-time computational cost. Finally, we implement the DIR-BHRNet on the current mainstream Android smartphones, which perform more than 10 FPS. The free-used executable file (Android 10), source code, and a video description of this work are publicly available on the page 1 to facilitate the development of real-time MPPE on smartphones.
