Table of Contents
Fetching ...

RadHARSimulator V2: Video to Doppler Generator

Weicheng Gao

TL;DR

RadHARSimulator V2 tackles the scarcity of flexible radar HAR simulators by delivering a full pipeline that converts video into Doppler-rich radar spectra. The approach fuses computer-vision–driven 3D pose estimation (RTMDet+GNN for detection/tracking; HRNet for 2D pose; nearest-neighbor 3D pose matching; Kalman smoothing) with physics-based radar echo generation (pose interpolation, wall/multipath modeling, RTM/DTM via MTI/STFT, DnCNN denoising, and ridge extraction). A novel SPNet architecture maps radar-domain representations to activity labels, and extensive numerical experiments on OSSet and RWSet validate data fidelity and HAR performance, including ablations and robustness analyses. The work provides a practical, open-source framework for realistic radar HAR simulation and downstream recognition tasks, with potential impact on data augmentation, sensor design, and algorithm development.

Abstract

Radar-based human activity recognition (HAR) still lacks a comprehensive simulation method. Existing software is developed based on models or motion-captured data, resulting in limited flexibility. To address this issue, a simulator that directly generates Doppler spectra from recorded video footage (RadHARSimulator V2) is presented in this paper. Both computer vision and radar modules are included in the simulator. In computer vision module, the real-time model for object detection with global nearest neighbor is first used to detect and track human targets in the video. Then, the high-resolution network is used to estimate two-dimensional poses of the detected human targets. Next, the three-dimensional poses of the detected human targets are obtained by nearest matching method. Finally, smooth temporal three-dimensional pose estimation is achieved through Kalman filtering. In radar module, pose interpolation and smoothing are first achieved through the Savitzky-Golay method. Second, the delay model and the mirror method are used to simulate echoes in both free-space and through-the-wall scenarios. Then, range-time map is generated using pulse compression, moving target indication, and DnCNN. Next, Doppler-time map (DTM) is generated using short-time Fourier transform and DnCNN again. Finally, the ridge features on the DTM are extracted using the maximum local energy method. In addition, a hybrid parallel-serial neural network architecture is proposed for radar-based HAR. Numerical experiments are conducted and analyzed to demonstrate the effectiveness of the designed simulator and the proposed network model. The open-source code of this work can be found in: https://github.com/JoeyBGOfficial/RadHARSimulatorV2-Video-to-Doppler-Generator.

RadHARSimulator V2: Video to Doppler Generator

TL;DR

RadHARSimulator V2 tackles the scarcity of flexible radar HAR simulators by delivering a full pipeline that converts video into Doppler-rich radar spectra. The approach fuses computer-vision–driven 3D pose estimation (RTMDet+GNN for detection/tracking; HRNet for 2D pose; nearest-neighbor 3D pose matching; Kalman smoothing) with physics-based radar echo generation (pose interpolation, wall/multipath modeling, RTM/DTM via MTI/STFT, DnCNN denoising, and ridge extraction). A novel SPNet architecture maps radar-domain representations to activity labels, and extensive numerical experiments on OSSet and RWSet validate data fidelity and HAR performance, including ablations and robustness analyses. The work provides a practical, open-source framework for realistic radar HAR simulation and downstream recognition tasks, with potential impact on data augmentation, sensor design, and algorithm development.

Abstract

Radar-based human activity recognition (HAR) still lacks a comprehensive simulation method. Existing software is developed based on models or motion-captured data, resulting in limited flexibility. To address this issue, a simulator that directly generates Doppler spectra from recorded video footage (RadHARSimulator V2) is presented in this paper. Both computer vision and radar modules are included in the simulator. In computer vision module, the real-time model for object detection with global nearest neighbor is first used to detect and track human targets in the video. Then, the high-resolution network is used to estimate two-dimensional poses of the detected human targets. Next, the three-dimensional poses of the detected human targets are obtained by nearest matching method. Finally, smooth temporal three-dimensional pose estimation is achieved through Kalman filtering. In radar module, pose interpolation and smoothing are first achieved through the Savitzky-Golay method. Second, the delay model and the mirror method are used to simulate echoes in both free-space and through-the-wall scenarios. Then, range-time map is generated using pulse compression, moving target indication, and DnCNN. Next, Doppler-time map (DTM) is generated using short-time Fourier transform and DnCNN again. Finally, the ridge features on the DTM are extracted using the maximum local energy method. In addition, a hybrid parallel-serial neural network architecture is proposed for radar-based HAR. Numerical experiments are conducted and analyzed to demonstrate the effectiveness of the designed simulator and the proposed network model. The open-source code of this work can be found in: https://github.com/JoeyBGOfficial/RadHARSimulatorV2-Video-to-Doppler-Generator.

Paper Structure

This paper contains 21 sections, 46 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Splash screen of RadHARSimulator V2.
  • Figure 2: The interface and processing overflow of RadHARSimulator V2.
  • Figure 3: Structure design of the RTMDet for detecting human targets in video frames.
  • Figure 4: Human tracking based on GNN.
  • Figure 5: Structure design of the HRNet for estimating 2D poses of the detected human targets.
  • ...and 11 more figures