mmPred: Radar-based Human Motion Prediction in the Dark
Junqiao Fan, Haocong Rao, Jiarui Zhang, Jianfei Yang, Lihua Xie
TL;DR
mmPred tackles radar-based human motion prediction under challenging lighting and privacy-constrained conditions by introducing a diffusion-based framework guided by dual-domain historical representations. It combines Time-domain Pose Refinement and Frequency-domain Dominant Motion to provide robust historical context, and leverages a Global Skeleton-relational Transformer backbone to model joint-wise relationships and global motion patterns. Extensive experiments on mmBody and mm-Fi demonstrate state-of-the-art performance and strong robustness to radar artifacts like miss-detections and multipath, outperforming RGB-based HMP baselines and prior radar methods. The work highlights mmWave radar as a privacy-preserving, lighting-agnostic sensing modality for HMP and outlines practical considerations for real-time diffusion-based motion forecasting, with avenues for cross-domain generalization and direct radar-signal processing in future work.
Abstract
Existing Human Motion Prediction (HMP) methods based on RGB-D cameras are sensitive to lighting conditions and raise privacy concerns, limiting their real-world applications such as firefighting and healthcare. Motivated by the robustness and privacy-preserving nature of millimeter-wave (mmWave) radar, this work introduces radar as a novel sensing modality for HMP, for the first time. Nevertheless, radar signals often suffer from specular reflections and multipath effects, resulting in noisy and temporally inconsistent measurements, such as body-part miss-detection. To address these radar-specific artifacts, we propose mmPred, the first diffusion-based framework tailored for radar-based HMP. mmPred introduces a dual-domain historical motion representation to guide the generation process, combining a Time-domain Pose Refinement (TPR) branch for learning fine-grained details and a Frequency-domain Dominant Motion (FDM) branch for capturing global motion trends and suppressing frame-level inconsistency. Furthermore, we design a Global Skeleton-relational Transformer (GST) as the diffusion backbone to model global inter-joint cooperation, enabling corrupted joints to dynamically aggregate information from others. Extensive experiments show that mmPred achieves state-of-the-art performance, outperforming existing methods by 8.6% on mmBody and 22% on mm-Fi.
