LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li
TL;DR
This work tackles the problem of estimating 3D human pose and shape directly from lensless measurements, addressing privacy concerns and the limitations of RGB reconstruction-based approaches. It introduces LPSNet, an end-to-end framework comprising a Multi-Scale Lensless Feature Decoder (MSFDecoder) and a Double-Head Auxiliary Supervision (DHAS) to extract robust, multi-scale features and improve limb accuracy. The method jointly regresses SMPL parameters while leveraging auxiliary supervision (SimCC keypoints and IUV dense maps) to enhance 2D/3D alignment, achieving better MPJPE, PA-MPJPE, and PVE than baselines on lensless data. The work demonstrates the feasibility and potential of privacy-preserving, compact 3D human pose estimation in real-world scenarios, while acknowledging data scarcity as a bottleneck and outlining pathways to larger lensless datasets and pretraining for broader generalization.
Abstract
Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.
