S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Shengpeng Wang, Yulong Xie, Qing Liao, Wei Wang
TL;DR
S$^3$E tackles radar-inertial state estimation under sparse radar measurements by fusing Range-Azimuth Spectrum signals with IMU data in a self-supervised framework. The core contributions are a Rotation-based Cross Fusion that maps rotational motion into azimuth shifts to enrich radar features, a Consistent Landmark Extractor for differentiable landmark tracking, and a Differentiable Velocity Estimation module that leverages Doppler information to infer instantaneous velocity without ground-truth labels. A joint Self-Supervised Loss enforces geometric and kinematic consistency, enabling robust pose estimation and geometry-aware landmark extraction on low-resolution radar data. Empirical results on ColoRadar and self-collected datasets show improved accuracy and denser landmark maps with fewer ghost points, indicating strong practical impact for radar-inertial localization in diverse environments.
Abstract
Millimeter-wave radar for state estimation is gaining significant attention for its affordability and reliability in harsh conditions. Existing localization solutions typically rely on post-processed radar point clouds as landmark points. Nonetheless, the inherent sparsity of radar point clouds, ghost points from multi-path effects, and limited angle resolution in single-chirp radar severely degrade state estimation performance. To address these issues, we propose S$^3$E, a \textbf{S}elf-\textbf{S}upervised \textbf{S}tate \textbf{E}stimator that employs more richly informative radar signal spectra to bypass sparse points and fuses complementary inertial information to achieve accurate localization. S$^3$E fully explores the association between \textit{exteroceptive} radar and \textit{proprioceptive} inertial sensor to achieve complementary benefits. To deal with limited angle resolution, we introduce a novel cross-fusion technique that enhances spatial structure information by exploiting subtle rotational shift correlations across heterogeneous data. The experimental results demonstrate our method achieves robust and accurate performance without relying on localization ground truth supervision. To the best of our knowledge, this is the first attempt to achieve state estimation by fusing radar spectra and inertial data in a complementary self-supervised manner.
