Radar Spectra-Language Model for Automotive Scene Parsing
Mariia Pushkareva, Yuri Feldman, Csaba Domokos, Kilian Rambach, Dotan Di Castro
TL;DR
This work tackles the interpretability and utility of automotive radar spectra by introducing a radar spectra-language model (RSLM) that aligns radar spectrum embeddings with a frozen vision-language model (VLM). By training a radar encoder to match image embeddings from automotive captions without requiring labeled radar data, the approach enables free-text querying of spectra and semantic retrieval of scene elements. The study demonstrates that RSLM embeddings can boost downstream tasks, improving object detection and free-space segmentation when injected into a baseline detector, and shows improved scene retrieval compared to non-fine-tuned baselines. The results suggest a practical path toward leveraging radar spectra for robust, weather-resilient autonomous driving, with limitations tied to caption quality and the need for more diverse automotive data.
Abstract
Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely used for driver assistance functions, and are expected to be crucial for the success of autonomous driving in the future. In many perception tasks only pre-processed radar point clouds are considered. In contrast, radar spectra are a raw form of radar measurements and contain more information than radar point clouds. However, radar spectra are rather difficult to interpret. In this work, we aim to explore the semantic information contained in spectra in the context of automated driving, thereby moving towards better interpretability of radar spectra. To this end, we create a radar spectra-language model, allowing us to query radar spectra measurements for the presence of scene elements using free text. We overcome the scarcity of radar spectra data by matching the embedding space of an existing vision-language model. Finally, we explore the benefit of the learned representation for scene retrieval using radar spectra only, and obtain improvements in free space segmentation and object detection merely by injecting the spectra embedding into a baseline model.
