Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements

Omkar Sawant; Luca Zanatta; Grzegorz Malczyk; Kostas Alexis

Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements

Omkar Sawant, Luca Zanatta, Grzegorz Malczyk, Kostas Alexis

Abstract

This paper presents a cross-modal learning framework that exploits complementary information from depth and grayscale images for robust navigation. We introduce a Cross-Modal Wasserstein Autoencoder that learns shared latent representations by enforcing cross-modal consistency, enabling the system to infer depth-relevant features from grayscale observations when depth measurements are corrupted. The learned representations are integrated with a Reinforcement Learning-based policy for collision-free navigation in unstructured environments when depth sensors experience degradation due to adverse conditions such as poor lighting or reflective surfaces. Simulation and real-world experiments demonstrate that our approach maintains robust performance under significant depth degradation and successfully transfers to real environments.

Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements

Abstract

Paper Structure (16 sections, 8 equations, 8 figures, 3 tables)

This paper contains 16 sections, 8 equations, 8 figures, 3 tables.

INTRODUCTION
RELATED WORK
PROBLEM FORMULATION
METHODOLOGY
Cross-Modal Wasserstein Autoencoder
Corruption scheme 1 ($S_1$)
Corruption scheme 2 ($S_2$)
Navigation Policy Learning
Reward Function
Network Architecture
Simulation Environment
EVALUATION STUDIES
Evaluations for CMWAE
Evaluations for the RL policy in Aerial Gym
Real-world experiments
...and 1 more sections

Figures (8)

Figure 1: Real-world navigation under depth sensor degradation with preserved grayscale image.
Figure 2: Architecture of the Cross-Modal Weighted Autoencoder for joint depth–grayscale representation learning.
Figure 3: Depth and grayscale images are encoded into a shared latent space via the cmwae. Later, the cross-modal representation is combined with state information and processed by the rl policy, which outputs the velocity actions for a quadrotor platform.
Figure 4: Top-down view of the training environment at each curriculum level.
Figure 5: mse and ssim box plots for CMWAE trained under corruption scheme $S_1$
...and 3 more figures

Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements

Abstract

Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements

Authors

Abstract

Table of Contents

Figures (8)