Table of Contents
Fetching ...

GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals

Shuokang Huang, Julie A. McCann

TL;DR

This work tackles cross-domain 3D human pose estimation from radio frequency signals by eliminating domain-specific confounders through generative counterfactuals. GenHPE synthesizes counterfactual RF signals conditioned on manipulated skeletons, computes differences to isolate body-part effects, and regularizes a domain-invariant encoder-decoder to improve generalization across unseen subjects and environments. The approach, validated on WiFi, UWB, and mmWave datasets, with DDPM, DDIM, and CGAN variants, delivers state-of-the-art cross-domain accuracy and shows ablations that confirm the importance of skeleton embeddings and counterfactual regularization. The method offers a practical, privacy-preserving alternative to camera-based HPE, with strong potential for robust sensing in diverse real-world environments.

Abstract

Human pose estimation (HPE) detects the positions of human body joints for various applications. Compared to using cameras, HPE using radio frequency (RF) signals is non-intrusive and more robust to adverse conditions, exploiting the signal variations caused by human interference. However, existing studies focus on single-domain HPE confined by domain-specific confounders, which cannot generalize to new domains and result in diminished HPE performance. Specifically, the signal variations caused by different human body parts are entangled, containing subject-specific confounders. RF signals are also intertwined with environmental noise, involving environment-specific confounders. In this paper, we propose GenHPE, a 3D HPE approach that generates counterfactual RF signals to eliminate domain-specific confounders. GenHPE trains generative models conditioned on human skeleton labels, learning how human body parts and confounders interfere with RF signals. We manipulate skeleton labels (i.e., removing body parts) as counterfactual conditions for generative models to synthesize counterfactual RF signals. The differences between counterfactual signals approximately eliminate domain-specific confounders and regularize an encoder-decoder model to learn domain-independent representations. Such representations help GenHPE generalize to new subjects/environments for cross-domain 3D HPE. We evaluate GenHPE on three public datasets from WiFi, ultra-wideband, and millimeter wave. Experimental results show that GenHPE outperforms state-of-the-art methods and reduces estimation errors by up to 52.2mm for cross-subject HPE and 10.6mm for cross-environment HPE.

GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals

TL;DR

This work tackles cross-domain 3D human pose estimation from radio frequency signals by eliminating domain-specific confounders through generative counterfactuals. GenHPE synthesizes counterfactual RF signals conditioned on manipulated skeletons, computes differences to isolate body-part effects, and regularizes a domain-invariant encoder-decoder to improve generalization across unseen subjects and environments. The approach, validated on WiFi, UWB, and mmWave datasets, with DDPM, DDIM, and CGAN variants, delivers state-of-the-art cross-domain accuracy and shows ablations that confirm the importance of skeleton embeddings and counterfactual regularization. The method offers a practical, privacy-preserving alternative to camera-based HPE, with strong potential for robust sensing in diverse real-world environments.

Abstract

Human pose estimation (HPE) detects the positions of human body joints for various applications. Compared to using cameras, HPE using radio frequency (RF) signals is non-intrusive and more robust to adverse conditions, exploiting the signal variations caused by human interference. However, existing studies focus on single-domain HPE confined by domain-specific confounders, which cannot generalize to new domains and result in diminished HPE performance. Specifically, the signal variations caused by different human body parts are entangled, containing subject-specific confounders. RF signals are also intertwined with environmental noise, involving environment-specific confounders. In this paper, we propose GenHPE, a 3D HPE approach that generates counterfactual RF signals to eliminate domain-specific confounders. GenHPE trains generative models conditioned on human skeleton labels, learning how human body parts and confounders interfere with RF signals. We manipulate skeleton labels (i.e., removing body parts) as counterfactual conditions for generative models to synthesize counterfactual RF signals. The differences between counterfactual signals approximately eliminate domain-specific confounders and regularize an encoder-decoder model to learn domain-independent representations. Such representations help GenHPE generalize to new subjects/environments for cross-domain 3D HPE. We evaluate GenHPE on three public datasets from WiFi, ultra-wideband, and millimeter wave. Experimental results show that GenHPE outperforms state-of-the-art methods and reduces estimation errors by up to 52.2mm for cross-subject HPE and 10.6mm for cross-environment HPE.

Paper Structure

This paper contains 34 sections, 35 equations, 8 figures, 5 tables, 4 algorithms.

Figures (8)

  • Figure 1: Comparison between (a) existing 3D HPE methods and (b) GenHPE in cross-domain scenarios. Existing methods train HPE models confined by domain-specific confounders and suffer from severe performance degradation in new domains. GenHPE introduces a generative model to help eliminate domain-specific confounders and regularizes an encoder-decoder model to learn domain-independent representations for cross-domain 3D HPE.
  • Figure 1: The network architecture of encoder.
  • Figure 2: Overview of the proposed GenHPE. Step 1: In the source domain, we train a generative model using ground-truth RF signals and skeleton labels. Step 2: We manipulate skeleton labels by removing each body part (e.g., arms, head, and thighs) and use them as counterfactual conditions for the generative model to synthesize counterfactual RF signals. The differences between counterfactual signals eliminate domain-specific confounders and are aggregated to approximate signals only related to body parts, regularizing an encoder-decoder model to learn domain-independent representations. Such representations generalize to both source and target domains for HPE.
  • Figure 2: The network architecture of decoder.
  • Figure 3: Cumulative distributions of MPJPE (mm) for cross-subject/environment 3D HPE with WiFi method_piw_3d.
  • ...and 3 more figures