BinaryHPE: 3D Human Pose and Shape Estimation via Binarization
Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang
TL;DR
3D human pose and shape estimation is powerful but resource-intensive. BinaryHPE introduces a binarized framework built on a BiDRN backbone with BiDRB blocks and a Binarized BoxNet to preserve essential full-precision information while dramatically reducing memory and compute. The method achieves strong results, outperforming existing SOTA binarized approaches and approaching, or even matching, full-precision Hand4Whole on key benchmarks, with a fraction of parameters and operations. This enables real-time, edge-device capable 3D mesh recovery for applications in AR/VR, sign language, and emotion recognition, advancing practical deployment of whole-body HPE.
Abstract
3D human pose and shape estimation (HPE) aims to reconstruct the 3D human body, face, and hands from a single image. Although powerful deep learning models have achieved accurate estimation in this task, they require enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited edge devices. In this work, we propose BinaryHPE, a novel binarization method designed to estimate the 3D human body, face, and hands parameters efficiently. Specifically, we propose a novel binary backbone called Binarized Dual Residual Network (BiDRN), designed to retain as much full-precision information as possible. Furthermore, we propose the Binarized BoxNet, an efficient sub-network for predicting face and hands bounding boxes, which further reduces model redundancy. Comprehensive quantitative and qualitative experiments demonstrate the effectiveness of BinaryHPE, which has a significant improvement over state-of-the-art binarization algorithms. Moreover, our BinaryHPE achieves comparable performance with the full-precision method Hand4Whole while using only 22.1% parameters and 14.8% operations. We will release all the code and pretrained models.
