Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Eunseon Choi; Junwoo Hong; Daehan Lee; Sanghyun Park; Hyunyoung Jo; Sunyoung Kim; Changho Kang; Seongsam Kim; Yonghan Jung; Jungwook Park; Seul Koo; Soohee Han

Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Eunseon Choi, Junwoo Hong, Daehan Lee, Sanghyun Park, Hyunyoung Jo, Sunyoung Kim, Changho Kang, Seongsam Kim, Yonghan Jung, Jungwook Park, Seul Koo, Soohee Han

Abstract

Accurate localization in autonomous driving is critical for successful missions including environmental mapping and survivor searches. In visually challenging environments, including low-light conditions, overexposure, illumination changes, and high parallax, the performance of conventional visual odometry methods significantly degrade undermining robust robotic navigation. Researchers have recently proposed LiDAR-inertial-visual odometry (LIVO) frameworks, that integrate LiDAR, IMU, and camera sensors, to address these challenges. This paper extends the FAST-LIVO2-based framework by introducing a hybrid approach that integrates direct photometric methods with descriptor-based feature matching. For the descriptor-based feature matching, this work proposes pairs of ORB with the Hamming distance, SuperPoint with SuperGlue, SuperPoint with LightGlue, and XFeat with the mutual nearest neighbor. The proposed configurations are benchmarked by accuracy, computational cost, and feature tracking stability, enabling a quantitative comparison of the adaptability and applicability of visual descriptors. The experimental results reveal that the proposed hybrid approach outperforms the conventional sparse-direct method. Although the sparse-direct method often fails to converge in regions where photometric inconsistency arises due to illumination changes, the proposed approach still maintains robust performance under the same conditions. Furthermore, the hybrid approach with learning-based descriptors enables robust and reliable visual state estimation across challenging environments.

Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Abstract

Paper Structure (22 sections, 8 figures, 6 tables)

This paper contains 22 sections, 8 figures, 6 tables.

Introduction
Related works
Direct Visual Odometry
Feature-based Visual Odometry
Learning-based Visual odometry
HYBRID SPARSE-DIRECT AND FEATURE-BASED VISUAL ODOMETRY
Visual-Inertial Odometry OF FAST-LIVO2
Visual Measurement Model
Local Mapping
Proposed Hybrid Visual-Inertial Odometry framework
Coarse-to-Fine Visual Measurement Module
Descriptor-based Local Mapping
Experiments
Datasets
Algorithm adopted
...and 7 more sections

Figures (8)

Figure 1: Framework for the LiDAR-inertial-visual odometry (LIVO) system built on FAST-LIVO2 to integrate four pairs of visual feature extractors and matchers
Figure 2: Step-by-step hybrid visual measurement update process
Figure 3: Mapping results on the AMValley03 sequence using sliding-window local mapping: (a) sparse-direct only, (b) ORB orb + Hamming distance, (c) SuperPoint superpoint + SuperGlue superglue, (d) SuperPoint superpoint + LightGlue lightglue, and (e) XFeat xfeat + mutual nearest neighbor search. Square red insets reveal zoomed-in point-cloud details, and the blue line traces the UAV’s flight path.
Figure 4: Trajectories on the Cave2 sequence. The sparse-direct (SD)-only method exhibits pronounced deviations at specific segments, which are caused by high parallax in (A) and illumination changes in (B)–(D). The color bar indicates the absolute trajectory error (m), and trajectories are aligned by translating the initial poses (origin alignment) to emphasize where and how drift accumulates without global alignment.
Figure 5: Resulting trajectories on the Cave1 sequence. The estimated trajectories were aligned with the ground truth using full SE(3) alignment (rotation and translation). Gray segments indicate regions where the position error falls within the range of $-0.1\,\mathrm{m} to 0.1\,\mathrm{m}$, and in the yaw plot, gray bands correspond to yaw errors from $-1^{\circ} to 1^{\circ}$. Here, SD, HD, SP, SG, LG, and MNN refer to Sparse-Direct svo, hamming distance, SuperPoint superpoint, SuperGlue superglue, LightGlue lightglue, and mutual nearest neighbor search, respectively.
...and 3 more figures

Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Abstract

Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Authors

Abstract

Table of Contents

Figures (8)