Table of Contents
Fetching ...

S$^3$E: Self-Supervised State Estimation for Radar-Inertial System

Shengpeng Wang, Yulong Xie, Qing Liao, Wei Wang

TL;DR

S$^3$E tackles radar-inertial state estimation under sparse radar measurements by fusing Range-Azimuth Spectrum signals with IMU data in a self-supervised framework. The core contributions are a Rotation-based Cross Fusion that maps rotational motion into azimuth shifts to enrich radar features, a Consistent Landmark Extractor for differentiable landmark tracking, and a Differentiable Velocity Estimation module that leverages Doppler information to infer instantaneous velocity without ground-truth labels. A joint Self-Supervised Loss enforces geometric and kinematic consistency, enabling robust pose estimation and geometry-aware landmark extraction on low-resolution radar data. Empirical results on ColoRadar and self-collected datasets show improved accuracy and denser landmark maps with fewer ghost points, indicating strong practical impact for radar-inertial localization in diverse environments.

Abstract

Millimeter-wave radar for state estimation is gaining significant attention for its affordability and reliability in harsh conditions. Existing localization solutions typically rely on post-processed radar point clouds as landmark points. Nonetheless, the inherent sparsity of radar point clouds, ghost points from multi-path effects, and limited angle resolution in single-chirp radar severely degrade state estimation performance. To address these issues, we propose S$^3$E, a \textbf{S}elf-\textbf{S}upervised \textbf{S}tate \textbf{E}stimator that employs more richly informative radar signal spectra to bypass sparse points and fuses complementary inertial information to achieve accurate localization. S$^3$E fully explores the association between \textit{exteroceptive} radar and \textit{proprioceptive} inertial sensor to achieve complementary benefits. To deal with limited angle resolution, we introduce a novel cross-fusion technique that enhances spatial structure information by exploiting subtle rotational shift correlations across heterogeneous data. The experimental results demonstrate our method achieves robust and accurate performance without relying on localization ground truth supervision. To the best of our knowledge, this is the first attempt to achieve state estimation by fusing radar spectra and inertial data in a complementary self-supervised manner.

S$^3$E: Self-Supervised State Estimation for Radar-Inertial System

TL;DR

SE tackles radar-inertial state estimation under sparse radar measurements by fusing Range-Azimuth Spectrum signals with IMU data in a self-supervised framework. The core contributions are a Rotation-based Cross Fusion that maps rotational motion into azimuth shifts to enrich radar features, a Consistent Landmark Extractor for differentiable landmark tracking, and a Differentiable Velocity Estimation module that leverages Doppler information to infer instantaneous velocity without ground-truth labels. A joint Self-Supervised Loss enforces geometric and kinematic consistency, enabling robust pose estimation and geometry-aware landmark extraction on low-resolution radar data. Empirical results on ColoRadar and self-collected datasets show improved accuracy and denser landmark maps with fewer ghost points, indicating strong practical impact for radar-inertial localization in diverse environments.

Abstract

Millimeter-wave radar for state estimation is gaining significant attention for its affordability and reliability in harsh conditions. Existing localization solutions typically rely on post-processed radar point clouds as landmark points. Nonetheless, the inherent sparsity of radar point clouds, ghost points from multi-path effects, and limited angle resolution in single-chirp radar severely degrade state estimation performance. To address these issues, we propose SE, a \textbf{S}elf-\textbf{S}upervised \textbf{S}tate \textbf{E}stimator that employs more richly informative radar signal spectra to bypass sparse points and fuses complementary inertial information to achieve accurate localization. SE fully explores the association between \textit{exteroceptive} radar and \textit{proprioceptive} inertial sensor to achieve complementary benefits. To deal with limited angle resolution, we introduce a novel cross-fusion technique that enhances spatial structure information by exploiting subtle rotational shift correlations across heterogeneous data. The experimental results demonstrate our method achieves robust and accurate performance without relying on localization ground truth supervision. To the best of our knowledge, this is the first attempt to achieve state estimation by fusing radar spectra and inertial data in a complementary self-supervised manner.

Paper Structure

This paper contains 15 sections, 9 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: S$^3$E fully explores complementary benefits from exteroceptive radar and proprioceptive inertial sensor to achieve accurate state estimation.
  • Figure 2: The subtle shifts in the rotational components map into the power's linear translations in the RAS.
  • Figure 3: Overview of S$^3$E. Given a pair RAS and inertial data, it outputs accurate poses and eometry-consistent landmarks.
  • Figure 4: Schematic of Rotation-based Cross Fusion.
  • Figure 5: Schematic diagram of estimation for the vehicle's instantaneous velocity in a differentiable manner. Left reveals a kinematic constraint between the vehicle velocity $^{\boldsymbol{I}}\boldsymbol{v}_k$, azimuth angles and relative radial velocities of stationary landmarks. The right solves the instantaneous velocity and filters dynamic targets and clutters by differentiable least squares.
  • ...and 5 more figures