Table of Contents
Fetching ...

A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets

Chengzhang Yu, Xianjun Yang, Wenxia Bao, Shaonan Wang, Zhiming Yao

TL;DR

This work tackles pressure-map based human keypoint detection with limited labeled data by introducing a self-supervised SPMKD framework that combines an Encoder-Fuser-Decoder (EFD) and CRWT. The EFD enables end-to-end extraction of keypoint heatmaps, features, and coordinates, while CRWT provides a two-stage pre-training that improves convergence and reconstruction with minimal data. Empirical results on two pressure-map datasets show SPMKD achieves competitive accuracy with only 0.613G FLOPs and 0.07M parameters, outperforming manually annotated baselines and demonstrating strong generalization on unseen data. The approach offers a practical, computation-efficient solution for privacy-preserving keypoint analysis across datasets and applications.

Abstract

In environments where RGB images are inadequate, pressure maps is a viable alternative, garnering scholarly attention. This study introduces a novel self-supervised pressure map keypoint detection (SPMKD) method, addressing the current gap in specialized designs for human keypoint extraction from pressure maps. Central to our contribution is the Encoder-Fuser-Decoder (EFD) model, which is a robust framework that integrates a lightweight encoder for precise human keypoint detection, a fuser for efficient gradient propagation, and a decoder that transforms human keypoints into reconstructed pressure maps. This structure is further enhanced by the Classification-to-Regression Weight Transfer (CRWT) method, which fine-tunes accuracy through initial classification task training. This innovation not only enhances human keypoint generalization without manual annotations but also showcases remarkable efficiency and generalization, evidenced by a reduction to only $5.96\%$ in FLOPs and $1.11\%$ in parameter count compared to the baseline methods.

A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets

TL;DR

This work tackles pressure-map based human keypoint detection with limited labeled data by introducing a self-supervised SPMKD framework that combines an Encoder-Fuser-Decoder (EFD) and CRWT. The EFD enables end-to-end extraction of keypoint heatmaps, features, and coordinates, while CRWT provides a two-stage pre-training that improves convergence and reconstruction with minimal data. Empirical results on two pressure-map datasets show SPMKD achieves competitive accuracy with only 0.613G FLOPs and 0.07M parameters, outperforming manually annotated baselines and demonstrating strong generalization on unseen data. The approach offers a practical, computation-efficient solution for privacy-preserving keypoint analysis across datasets and applications.

Abstract

In environments where RGB images are inadequate, pressure maps is a viable alternative, garnering scholarly attention. This study introduces a novel self-supervised pressure map keypoint detection (SPMKD) method, addressing the current gap in specialized designs for human keypoint extraction from pressure maps. Central to our contribution is the Encoder-Fuser-Decoder (EFD) model, which is a robust framework that integrates a lightweight encoder for precise human keypoint detection, a fuser for efficient gradient propagation, and a decoder that transforms human keypoints into reconstructed pressure maps. This structure is further enhanced by the Classification-to-Regression Weight Transfer (CRWT) method, which fine-tunes accuracy through initial classification task training. This innovation not only enhances human keypoint generalization without manual annotations but also showcases remarkable efficiency and generalization, evidenced by a reduction to only in FLOPs and in parameter count compared to the baseline methods.
Paper Structure (7 sections, 1 equation, 6 figures, 2 tables, 1 algorithm)

This paper contains 7 sections, 1 equation, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of EFD: Encoder detects Keypoint Heatmaps, Positive Vectors, and Feature Vectors. Fuser fuses elements and applies dimensionality expansion to avoid weight shift. Decoder uses a fully connected layer and the proposed RebuildNet to reconstruct the pressure map.
  • Figure 2: Structure of RebuildNet: (a) Overall structure, where the first four repetitive layers form the network backbone and the final two layers form the network head, interchangeable for different tasks. (b) Expansion Layer structure using three different sizes of dilated convolution for full fusion of features at different distances. (c) Exchange Layer structure ensures full fusion of features of different dimensions.
  • Figure 3: CRWT Schematic: The red line illustrates the negative impact of omitting CRWT in pre-training weights, leading to significant noise in the training outcomes. In contrast, the black line represents the application of CRWT, demonstrating its effectiveness in improving training results.
  • Figure 4: Pressure Map vs. Reconstructed Pressure Map: White areas are points with pressure values, and black areas are points without pressure values.
  • Figure 5: Loss Line Graph: For improved clarity, only the final 40 epochs are depicted. (a) L1 loss convergence curve. (b) L2 loss convergence curve, where SPMKD (excluding Dilation Convolution) partially overlaps with SPMKD (excluding Exchange Layer). (c) SSIM loss convergence curve.
  • ...and 1 more figures