ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM

Tali Orlev Shapira; Itzik Klein

ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM

Tali Orlev Shapira, Itzik Klein

TL;DR

This work introduces ICD-Net, a two-head neural network that learns inertial displacement and uncertainty directly from raw IMU data to augment visual-inertial SLAM. By predicting per-axis covariances and integrating them as residuals in the VINS-Fusion optimization, the approach compensates for calibration errors, sensor noise, and high-dynamics common in drone flight. The method demonstrates significant reductions in absolute pose error across challenging high-speed drone sequences and remains robust during camera blackouts, with the uncertainty estimates effectively weighting neural constraints in the optimization. ICD-Net also functions as a standalone inertial odometry system and holds promise for broader adoption in Kalman-filter-based pipelines and other probabilistic estimators.

Abstract

Visual-inertial SLAM systems often exhibit suboptimal performance due to multiple confounding factors including imperfect sensor calibration, noisy measurements, rapid motion dynamics, low illumination, and the inherent limitations of traditional inertial navigation integration methods. These issues are particularly problematic in drone applications where robust and accurate state estimation is critical for safe autonomous operation. In this work, we present ICD-Net, a novel framework that enhances visual-inertial SLAM performance by learning to process raw inertial measurements and generating displacement estimates with associated uncertainty quantification. Rather than relying on analytical inertial sensor models that struggle with real-world sensor imperfections, our method directly extracts displacement maps from sensor data while simultaneously predicting measurement covariances that reflect estimation confidence. We integrate ICD-Net outputs as additional residual constraints into the VINS-Fusion optimization framework, where the predicted uncertainties appropriately weight the neural network contributions relative to traditional visual and inertial terms. The learned displacement constraints provide complementary information that compensates for various error sources in the SLAM pipeline. Our approach can be used under both normal operating conditions and in situations of camera inconsistency or visual degradation. Experimental evaluation on challenging high-speed drone sequences demonstrated that our approach significantly improved trajectory estimation accuracy compared to standard VINS-Fusion, with more than 38% improvement in mean APE and uncertainty estimates proving crucial for maintaining system robustness. Our method shows that neural network enhancement can effectively address multiple sources of SLAM degradation while maintaining real-time performance requirements.

ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM

TL;DR

Abstract

ICD-Net: Inertial Covariance Displacement Network for Drone Visual-Inertial SLAM

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)