Table of Contents
Fetching ...

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li

TL;DR

This work tackles the problem of estimating 3D human pose and shape directly from lensless measurements, addressing privacy concerns and the limitations of RGB reconstruction-based approaches. It introduces LPSNet, an end-to-end framework comprising a Multi-Scale Lensless Feature Decoder (MSFDecoder) and a Double-Head Auxiliary Supervision (DHAS) to extract robust, multi-scale features and improve limb accuracy. The method jointly regresses SMPL parameters while leveraging auxiliary supervision (SimCC keypoints and IUV dense maps) to enhance 2D/3D alignment, achieving better MPJPE, PA-MPJPE, and PVE than baselines on lensless data. The work demonstrates the feasibility and potential of privacy-preserving, compact 3D human pose estimation in real-world scenarios, while acknowledging data scarcity as a bottleneck and outlining pathways to larger lensless datasets and pretraining for broader generalization.

Abstract

Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

TL;DR

This work tackles the problem of estimating 3D human pose and shape directly from lensless measurements, addressing privacy concerns and the limitations of RGB reconstruction-based approaches. It introduces LPSNet, an end-to-end framework comprising a Multi-Scale Lensless Feature Decoder (MSFDecoder) and a Double-Head Auxiliary Supervision (DHAS) to extract robust, multi-scale features and improve limb accuracy. The method jointly regresses SMPL parameters while leveraging auxiliary supervision (SimCC keypoints and IUV dense maps) to enhance 2D/3D alignment, achieving better MPJPE, PA-MPJPE, and PVE than baselines on lensless data. The work demonstrates the feasibility and potential of privacy-preserving, compact 3D human pose estimation in real-world scenarios, while acknowledging data scarcity as a bottleneck and outlining pathways to larger lensless datasets and pretraining for broader generalization.

Abstract

Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.
Paper Structure (16 sections, 7 equations, 8 figures, 2 tables)

This paper contains 16 sections, 7 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging. We contribute a framework for estimating human poses and shapes from individual lensless measurements. The first row shows the input measurements acquired by our lensless imaging system, the second row shows the estimated human poses and shapes from lensless measurements, and the bottom row shows the 3D results shown in different views.
  • Figure 2: The workflow of the lensless imaging system and the final measurement obtained is the result obtained from the encoding of 3D scene information by the lensless imaging system. The optically encoded mask transforms local information in the 3D scene into overlapping global information.
  • Figure 3: Overview of the proposed framework. A measurement $M$ is passed through a Multi-Scale Lensless Feature Decoder to get spatial characteristics at different scales. These feature maps are fed into the regressor for human pose and shape estimation. Also, these feature maps are fed into the Double-Head Auxiliary Supervision to improve the estimation accuracy.
  • Figure 4: On the top left is a figure of the dual camera system we created. This system consists of an RGB camera and a lensless imaging system using a diffuser as a mask. The top right shows the process of collecting our dataset. The image at the bottom is a record of the process of collecting our dataset.
  • Figure 5: At the top is the frame of our baseline. Lensless measurements first recover the image by reconstruction methods and then estimate human pose and shape by PyMAF. The bottom image compares the recovered image with the original image, and we can see that the quality of the recovered image has dropped dramatically.
  • ...and 3 more figures