RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

Yuan-Hao Ho; Jen-Hao Cheng; Sheng Yao Kuan; Zhongyu Jiang; Wenhao Chai; Hsiang-Wei Huang; Chih-Lung Lin; Jenq-Neng Hwang

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang

TL;DR

RT-Pose tackles privacy-sensitive 3D human pose estimation by leveraging calibrated 4D radar tensors alongside LiDAR and RGB data. It introduces the RT-Pose dataset (72k frames, 240 sequences, 6 actions) and a single-stage baseline HRRadarPose that learns high-resolution features directly from the 4D tensor, achieving a mean pose error of $MPJPE=9.93$ cm and localization error of $MRPE=9.91$ cm on challenging scenes. The annotation workflow combines HRNet-based 2D poses, ZeDO-based 3D pose estimation, and LiDAR depth to produce accurate 3D skeletons with manual refinement. Overall, RT-Pose demonstrates that raw 4D radar tensors provide richer information than radar point clouds for robust 3D HPE in complex, real-world conditions, offering a valuable benchmark and a strong baseline for future radar-based HPE methods, with potential impact on privacy-preserving, through-wall, and occlusion-robust applications. The dataset uses a single radar module to capture vertical and horizontal cues, simplifying setup while preserving performance, with dimensions $64\times32\times128\times256$ along velocity and spatial axes.

Abstract

Traditional methods for human localization and pose estimation (HPE), which mainly rely on RGB images as an input modality, confront substantial limitations in real-world applications due to privacy concerns. In contrast, radar-based HPE methods emerge as a promising alternative, characterized by distinctive attributes such as through-wall recognition and privacy-preserving, rendering the method more conducive to practical deployments. This paper presents a Radar Tensor-based human pose (RT-Pose) dataset and an open-source benchmarking framework. The RT-Pose dataset comprises 4D radar tensors, LiDAR point clouds, and RGB images, and is collected for a total of 72k frames across 240 sequences with six different complexity-level actions. The 4D radar tensor provides raw spatio-temporal information, differentiating it from other radar point cloud-based datasets. We develop an annotation process using RGB images and LiDAR point clouds to accurately label 3D human skeletons. In addition, we propose HRRadarPose, the first single-stage architecture that extracts the high-resolution representation of 4D radar tensors in 3D space to aid human keypoint estimation. HRRadarPose outperforms previous radar-based HPE work on the RT-Pose benchmark. The overall HRRadarPose performance on the RT-Pose dataset, as reflected in a mean per joint position error (MPJPE) of 9.91cm, indicates the persistent challenges in achieving accurate HPE in complex real-world scenarios. RT-Pose is available at https://huggingface.co/datasets/uwipl/RT-Pose.

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

TL;DR

cm and localization error of

cm on challenging scenes. The annotation workflow combines HRNet-based 2D poses, ZeDO-based 3D pose estimation, and LiDAR depth to produce accurate 3D skeletons with manual refinement. Overall, RT-Pose demonstrates that raw 4D radar tensors provide richer information than radar point clouds for robust 3D HPE in complex, real-world conditions, offering a valuable benchmark and a strong baseline for future radar-based HPE methods, with potential impact on privacy-preserving, through-wall, and occlusion-robust applications. The dataset uses a single radar module to capture vertical and horizontal cues, simplifying setup while preserving performance, with dimensions

along velocity and spatial axes.

Abstract

Paper Structure (18 sections, 1 equation, 9 figures, 8 tables)

This paper contains 18 sections, 1 equation, 9 figures, 8 tables.

Introduction
Related Works
3D Human Pose Estimation Datasets
Radar-based Human Pose Estimation
RT-Pose Dataset
Sensors
Data Collection
Data Processing
Annotation Workflow
HRRadarPose
Experiments
Evaluation metrics
Baselines Comparison
Action complexity Analysis
Qualitative Result
...and 3 more sections

Figures (9)

Figure 1: Experimental hardware setup in an indoor environment for data collection.
Figure 2: Data distribution for RT-Pose dataset: (a) Activities; (b) Environmental conditions; (c) Occlusion conditions
Figure 3: Experimental instances across various indoor and outdoor conditions with diverse scenarios for data collection.
Figure 4: Radar signal processing flow. The green arrow line is for radar point cloud generation and the blue line is for 4D radar tensor generation.
Figure 5: Workflow of human localization and 3D pose ground truth annotations. Estimated 2D pose results, predicted by the pre-trained HRNet model, are denoted as $P_{2d}$. The initial setting pose derived from LiDAR point clouds is denoted as $P_{init}$. Both $P_{2d}$ and $P_{init}$ are inputs into ZeDO, an optimization-based pipeline for 3D pose estimation.
...and 4 more figures

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

TL;DR

Abstract

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

Authors

TL;DR

Abstract

Table of Contents

Figures (9)