Table of Contents
Fetching ...

Learning Spatial Structure from Pre-Beamforming Per-Antenna Range-Doppler Radar Data via Visibility-Aware Cross-Modal Supervision

George Sebastian, Philipp Berthold, Bianca Forkel, Leon Pohl, Mirko Maehlisch

Abstract

Automotive radar perception pipelines commonly construct angle-domain representations via beamforming before applying learning-based models. This work instead investigates a representational question: can meaningful spatial structure be learned directly from pre-beamforming per-antenna range-Doppler (RD) measurements? Experiments are conducted on a 6-TX x 8-RX (48 virtual antennas) commodity automotive radar employing an A/B chirp-sequence frequency-modulated continuous-wave (CS-FMCW) transmit scheme, in which the effective transmit aperture varies between chirps (single-TX vs. multi-TX), enabling controlled analysis of chirp-dependent transmit configurations. We operate on pre-beamforming per-antenna RD tensors using a dual-chirp shared-weight encoder trained in an end-to-end, fully data-driven manner, and evaluate spatial recoverability using bird's-eye-view (BEV) occupancy as a geometric probe rather than a performance-driven objective. Supervision is visibility-aware and cross-modal, derived from LiDAR with explicit modeling of the radar field-of-view and occlusion-aware LiDAR observability via ray-based visibility. Through chirp ablations (A-only, B-only, A+B), range-band analysis, and physics-aligned baselines, we assess how transmit configurations affect geometric recoverability. The results indicate that spatial structure can be learned directly from pre-beamforming per-antenna RD tensors without explicit angle-domain construction or hand-crafted signal-processing stages.

Learning Spatial Structure from Pre-Beamforming Per-Antenna Range-Doppler Radar Data via Visibility-Aware Cross-Modal Supervision

Abstract

Automotive radar perception pipelines commonly construct angle-domain representations via beamforming before applying learning-based models. This work instead investigates a representational question: can meaningful spatial structure be learned directly from pre-beamforming per-antenna range-Doppler (RD) measurements? Experiments are conducted on a 6-TX x 8-RX (48 virtual antennas) commodity automotive radar employing an A/B chirp-sequence frequency-modulated continuous-wave (CS-FMCW) transmit scheme, in which the effective transmit aperture varies between chirps (single-TX vs. multi-TX), enabling controlled analysis of chirp-dependent transmit configurations. We operate on pre-beamforming per-antenna RD tensors using a dual-chirp shared-weight encoder trained in an end-to-end, fully data-driven manner, and evaluate spatial recoverability using bird's-eye-view (BEV) occupancy as a geometric probe rather than a performance-driven objective. Supervision is visibility-aware and cross-modal, derived from LiDAR with explicit modeling of the radar field-of-view and occlusion-aware LiDAR observability via ray-based visibility. Through chirp ablations (A-only, B-only, A+B), range-band analysis, and physics-aligned baselines, we assess how transmit configurations affect geometric recoverability. The results indicate that spatial structure can be learned directly from pre-beamforming per-antenna RD tensors without explicit angle-domain construction or hand-crafted signal-processing stages.

Paper Structure

This paper contains 21 sections, 2 equations, 7 figures, 5 tables.

Figures (7)

  • Figure A1: Despite ambiguous RD observations (near-zero Doppler), spatial structure is recoverable in BEV without explicit angle-domain processing.
  • Figure A2: Overview of the proposed approach. Top: Prediction from pre-beamforming per-antenna range-Doppler (RD) tensors using learned spatial mixing and a convolutional RD-to-BEV mapping, without explicit angle-domain processing. Bottom: Visibility-aware cross-modal supervision constructed by intersecting the radar horizontal field-of-view (HFOV) with the LiDAR observability mask. In the supervision mask, LiDAR-observable occupied (yellow) and free (teal) cells within the radar HFOV are retained, while LiDAR-unobservable regions (purple) within the HFOV are treated as unknown and excluded from supervision; the radar HFOV is shown in white (with black denoting outside-HFOV regions in the intermediate mask), and regions outside the HFOV are not considered during training or evaluation. Training uses masked focal loss restricted to the valid region, enabling evaluation of geometric recoverability from RD as a probe task.
  • Figure C1: Sensor setup with radar (red), LiDAR (blue), and camera (green).
  • Figure E1: Comparison between LiDAR BEV GT, the range-energy (radial) baseline, and the proposed method. The baseline produces radially symmetric responses due to the absence of angular information, while the proposed model recovers spatially localized structure from pre-beamforming RD.
  • Figure E2: Qualitative radar BEV occupancy predictions from pre-beamforming RD tensors. Each row shows (from left to right) the camera image (for visualization only), RD magnitude (single RX, chirp A; range on the vertical axis, Doppler on the horizontal axis, zero Doppler centered), LiDAR BEV ground truth within the visible region, and the radar BEV prediction. For visualization, RD is shown for a single receive channel and chirp with Doppler centered via FFT shift and magnitude log-compressed and normalized, while the model operates on all receive channels and both chirps (A+B) using the native RD representation without these visualization transformations. In the LiDAR BEV ground-truth (GT) panel, occupied cells are shown in yellow, free space in teal, and LiDAR-unobservable regions within the radar HFOV in purple (unknown and excluded from supervision), while regions outside the radar HFOV are not considered during training or evaluation; in the radar BEV prediction, non-occupied cells are shown in purple. Radar predictions recover coherent spatial responses for large structures while exhibiting more diffuse, blob-like responses compared to LiDAR. Notably, radar frequently produces responses in regions occluded or unobserved in LiDAR, which may arise from multipath propagation and differences in sensing physics, indicating that the model captures radar-specific phenomena beyond LiDAR supervision.
  • ...and 2 more figures