Table of Contents
Fetching ...

mmPred: Radar-based Human Motion Prediction in the Dark

Junqiao Fan, Haocong Rao, Jiarui Zhang, Jianfei Yang, Lihua Xie

TL;DR

mmPred tackles radar-based human motion prediction under challenging lighting and privacy-constrained conditions by introducing a diffusion-based framework guided by dual-domain historical representations. It combines Time-domain Pose Refinement and Frequency-domain Dominant Motion to provide robust historical context, and leverages a Global Skeleton-relational Transformer backbone to model joint-wise relationships and global motion patterns. Extensive experiments on mmBody and mm-Fi demonstrate state-of-the-art performance and strong robustness to radar artifacts like miss-detections and multipath, outperforming RGB-based HMP baselines and prior radar methods. The work highlights mmWave radar as a privacy-preserving, lighting-agnostic sensing modality for HMP and outlines practical considerations for real-time diffusion-based motion forecasting, with avenues for cross-domain generalization and direct radar-signal processing in future work.

Abstract

Existing Human Motion Prediction (HMP) methods based on RGB-D cameras are sensitive to lighting conditions and raise privacy concerns, limiting their real-world applications such as firefighting and healthcare. Motivated by the robustness and privacy-preserving nature of millimeter-wave (mmWave) radar, this work introduces radar as a novel sensing modality for HMP, for the first time. Nevertheless, radar signals often suffer from specular reflections and multipath effects, resulting in noisy and temporally inconsistent measurements, such as body-part miss-detection. To address these radar-specific artifacts, we propose mmPred, the first diffusion-based framework tailored for radar-based HMP. mmPred introduces a dual-domain historical motion representation to guide the generation process, combining a Time-domain Pose Refinement (TPR) branch for learning fine-grained details and a Frequency-domain Dominant Motion (FDM) branch for capturing global motion trends and suppressing frame-level inconsistency. Furthermore, we design a Global Skeleton-relational Transformer (GST) as the diffusion backbone to model global inter-joint cooperation, enabling corrupted joints to dynamically aggregate information from others. Extensive experiments show that mmPred achieves state-of-the-art performance, outperforming existing methods by 8.6% on mmBody and 22% on mm-Fi.

mmPred: Radar-based Human Motion Prediction in the Dark

TL;DR

mmPred tackles radar-based human motion prediction under challenging lighting and privacy-constrained conditions by introducing a diffusion-based framework guided by dual-domain historical representations. It combines Time-domain Pose Refinement and Frequency-domain Dominant Motion to provide robust historical context, and leverages a Global Skeleton-relational Transformer backbone to model joint-wise relationships and global motion patterns. Extensive experiments on mmBody and mm-Fi demonstrate state-of-the-art performance and strong robustness to radar artifacts like miss-detections and multipath, outperforming RGB-based HMP baselines and prior radar methods. The work highlights mmWave radar as a privacy-preserving, lighting-agnostic sensing modality for HMP and outlines practical considerations for real-time diffusion-based motion forecasting, with avenues for cross-domain generalization and direct radar-signal processing in future work.

Abstract

Existing Human Motion Prediction (HMP) methods based on RGB-D cameras are sensitive to lighting conditions and raise privacy concerns, limiting their real-world applications such as firefighting and healthcare. Motivated by the robustness and privacy-preserving nature of millimeter-wave (mmWave) radar, this work introduces radar as a novel sensing modality for HMP, for the first time. Nevertheless, radar signals often suffer from specular reflections and multipath effects, resulting in noisy and temporally inconsistent measurements, such as body-part miss-detection. To address these radar-specific artifacts, we propose mmPred, the first diffusion-based framework tailored for radar-based HMP. mmPred introduces a dual-domain historical motion representation to guide the generation process, combining a Time-domain Pose Refinement (TPR) branch for learning fine-grained details and a Frequency-domain Dominant Motion (FDM) branch for capturing global motion trends and suppressing frame-level inconsistency. Furthermore, we design a Global Skeleton-relational Transformer (GST) as the diffusion backbone to model global inter-joint cooperation, enabling corrupted joints to dynamically aggregate information from others. Extensive experiments show that mmPred achieves state-of-the-art performance, outperforming existing methods by 8.6% on mmBody and 22% on mm-Fi.

Paper Structure

This paper contains 47 sections, 17 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: (a) mmWave radar-based HMP under darkness. (b) t-SNE visualization of joint locations and velocities from predicted historical poses. We compare our frequency-domain prediction with the state-of-the-art (SOTA) pose estimator yang2023mm. Our method produces more distinguishable velocity patterns across actions than existing methods.
  • Figure 2: System architecture: mmPred employs Dual-Domain History Estimation to extract historical motion representations in both the time (TPR) and frequency domains (FDM). These representations are fused via a feature fusion module to construct the condition embedding $C$, which guides the diffusion-based future motion prediction performed in the frequency domain.
  • Figure 3: Comparison of estimated motion history (colored) to GT (black) in dual domain. We zoom in on the right-hand area and use red arrows to mark the joint velocity. TPR demonstrates more accurate pose location, while FDM offers more consistent velocity information.
  • Figure 4: HMP under adverse environments using RGB, mmWave (ours), and MoCap GT. The red boxes enclose estimated historical poses, and the blue boxes enclose the predicted future poses (predicted frames are stacked).
  • Figure 5: Qualitative visualization of our method compared to baseline on the mmBody dataset, where red poses denote GT and colored poses denote our prediction. Red boxes enclose the 0.5s historical frames, and blue boxes enclose the 1s predicted future. The higher overlap between predicted and gt poses indicates higher accuracy.
  • ...and 9 more figures