Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

Emily Bejerano; Federico Tondolo; Aayan Qayyum; Xiaofan Yu; Xiaofan Jiang

Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

Emily Bejerano, Federico Tondolo, Aayan Qayyum, Xiaofan Yu, Xiaofan Jiang

TL;DR

Sim2Radar is presented, an end-to-end framework that synthesizes training radar data directly from single-view RGB images, enabling scalable data generation without manual scene modeling and suggesting that physics-based, vision-driven radar simulation can provide effective geometric priors for radar learning and measurably improve performance under limited real-data supervision.

Abstract

Millimeter-wave (mmWave) radar provides reliable perception in visually degraded indoor environments (e.g., smoke, dust, and low light), but learning-based radar perception is bottlenecked by the scarcity and cost of collecting and annotating large-scale radar datasets. We present Sim2Radar, an end-to-end framework that synthesizes training radar data directly from single-view RGB images, enabling scalable data generation without manual scene modeling. Sim2Radar reconstructs a material-aware 3D scene by combining monocular depth estimation, segmentation, and vision-language reasoning to infer object materials, then simulates mmWave propagation with a configurable physics-based ray tracer using Fresnel reflection models parameterized by ITU-R electromagnetic properties. Evaluated on real-world indoor scenes, Sim2Radar improves downstream 3D radar perception via transfer learning: pre-training a radar point-cloud object detection model on synthetic data and fine-tuning on real radar yields up to +3.7 3D AP (IoU 0.3), with gains driven primarily by improved spatial localization. These results suggest that physics-based, vision-driven radar simulation can provide effective geometric priors for radar learning and measurably improve performance under limited real-data supervision.

Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

TL;DR

Abstract

Paper Structure (30 sections, 1 equation, 4 figures, 1 table)

This paper contains 30 sections, 1 equation, 4 figures, 1 table.

INTRODUCTION
RELATED WORK
Radar Simulation
Generative AI For RF Sensing
mmWave Perception And Object Recognition
Why VLMs For Material Classification?
Indoor Radar Dataset
VLM-ASSISTED SCENE RECONSTRUCTION
Input Requirements
Stage 1: Monocular Depth Estimation
Stage 2: Automatic Segmentation And Detection
Stage 3: VLM Material Classification
Stage 4: 3D Projection And Output
PHYSICS-BASED RADAR SIMULATION
Material Electromagnetic Properties
...and 15 more sections

Figures (4)

Figure 1: VLM-assisted scene reconstruction pipeline. (a) RGB and sparse LiDAR input. (b) Dense depth from MoGe. (c) SAM2 masks and Grounding DINO detections. (d) InternVL2.5-8B material labels. (e) Material-labeled 3D scene ready for ray tracing.
Figure 2: Sim-real radar comparison. Blue: real radar (2,057 points). Red: simulated radar (251 points). Density ratio: 12%. Despite density differences, both capture similar spatial structure including walls and doors.
Figure 3: Sim pre-training consistently improves 3D AP across all data regimes.
Figure 4: Proposed contrastive alignment framework for future work, where a radar encoder trained on synthetic data is aligned with text-based physical descriptions and later adapted to real radar data in a shared embedding space.

Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

TL;DR

Abstract

Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)