Table of Contents
Fetching ...

MARVO: Marine-Adaptive Radiance-aware Visual Odometry

Sacchin Sundar, Atman Kikani, Aaliya Alam, Sumukh Shrote, A. Nayeemulla Khan, A. Shahina

TL;DR

MARVO tackles underwater visual odometry by fusing physics-aware front-end perception with a probabilistic visual–inertial–barometric backend and an offline RL-driven pose-graph optimizer. The front-end extends LoFTR with a Physics-Aware Radiance Adapter to compensate wavelength-dependent attenuation, enabling stable semi-dense correspondences under turbidity. The back-end uses a fixed-lag GTSAM estimator with PARA-enhanced visual factors and barometric depth, followed by RL-PGO to achieve globally consistent trajectories in SE(2) before restoring to SE(3). Synthetic data generated via SyreaNet plus real-field fine-tuning underpins robust training, with evaluations showing improved AUC, ATE, and drift over baselines on both synthetic and real underwater sequences. The work highlights practical gains for underwater robotics, while outlining limitations due to data scarcity and suggesting future work in 3D mapping and acoustic depth integration.

Abstract

Underwater visual localization remains challenging due to wavelength-dependent attenuation, poor texture, and non-Gaussian sensor noise. We introduce MARVO, a physics-aware, learning-integrated odometry framework that fuses underwater image formation modeling, differentiable matching, and reinforcement-learning optimization. At the front-end, we extend transformer-based feature matcher with a Physics Aware Radiance Adapter that compensates for color channel attenuation and contrast loss, yielding geometrically consistent feature correspondences under turbidity. These semi dense matches are combined with inertial and pressure measurements inside a factor-graph backend, where we formulate a keyframe-based visual-inertial-barometric estimator using GTSAM library. Each keyframe introduces (i) Pre-integrated IMU motion factors, (ii) MARVO-derived visual pose factors, and (iii) barometric depth priors, giving a full-state MAP estimate in real time. Lastly, we introduce a Reinforcement-Learningbased Pose-Graph Optimizer that refines global trajectories beyond local minima of classical least-squares solvers by learning optimal retraction actions on SE(2).

MARVO: Marine-Adaptive Radiance-aware Visual Odometry

TL;DR

MARVO tackles underwater visual odometry by fusing physics-aware front-end perception with a probabilistic visual–inertial–barometric backend and an offline RL-driven pose-graph optimizer. The front-end extends LoFTR with a Physics-Aware Radiance Adapter to compensate wavelength-dependent attenuation, enabling stable semi-dense correspondences under turbidity. The back-end uses a fixed-lag GTSAM estimator with PARA-enhanced visual factors and barometric depth, followed by RL-PGO to achieve globally consistent trajectories in SE(2) before restoring to SE(3). Synthetic data generated via SyreaNet plus real-field fine-tuning underpins robust training, with evaluations showing improved AUC, ATE, and drift over baselines on both synthetic and real underwater sequences. The work highlights practical gains for underwater robotics, while outlining limitations due to data scarcity and suggesting future work in 3D mapping and acoustic depth integration.

Abstract

Underwater visual localization remains challenging due to wavelength-dependent attenuation, poor texture, and non-Gaussian sensor noise. We introduce MARVO, a physics-aware, learning-integrated odometry framework that fuses underwater image formation modeling, differentiable matching, and reinforcement-learning optimization. At the front-end, we extend transformer-based feature matcher with a Physics Aware Radiance Adapter that compensates for color channel attenuation and contrast loss, yielding geometrically consistent feature correspondences under turbidity. These semi dense matches are combined with inertial and pressure measurements inside a factor-graph backend, where we formulate a keyframe-based visual-inertial-barometric estimator using GTSAM library. Each keyframe introduces (i) Pre-integrated IMU motion factors, (ii) MARVO-derived visual pose factors, and (iii) barometric depth priors, giving a full-state MAP estimate in real time. Lastly, we introduce a Reinforcement-Learningbased Pose-Graph Optimizer that refines global trajectories beyond local minima of classical least-squares solvers by learning optimal retraction actions on SE(2).

Paper Structure

This paper contains 29 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of MARVO. PARA enhances LoFTR features using physically-informed radiance correction. Corrected visual factors are fused with IMU and barometric depth in a GTSAM factor graph to produce real-time VO. An offline reinforcement-learning agent performs pose-graph refinement to obtain globally consistent trajectories.
  • Figure 2: Examples of synthetic data used in MARVO. Each row shows an example from a different dataset (ScanNet, TartanAir, Hypersim): original RGB image, corresponding depth map, and the generated synthetic underwater image with simulated attenuation and scattering.
  • Figure 3: Qualitative feature matching comparison: MARVO produces denser and more geometrically stable correspondences than SuperGlue, ORB, and LoFTR under underwater conditions characterized by turbidity, color attenuation, and low texture. Conventional matchers degrade noticeably, while MARVO maintains semi-dense and spatially coherent matches through physics-aware radiance modulation.
  • Figure 4: PARA architecture: The Physics-Aware Radiance Adapter takes coarse CNN features and predicts per-pixel attenuation and backscatter fields. These are used to generate a radiance correction mask $\Gamma(x)$, which normalizes intermediate descriptors before transformer matching. PARA compensates for wavelength-dependent attenuation, color imbalance, and contrast degradation common in underwater environments.
  • Figure 5: Reinforcement learning-based pose-graph optimization. A GNN encoder maps an initial pose graph to latent edge features, which condition a recurrent SAC agent. The agent iteratively applies retraction actions; a final linear least-squares step produces the optimized pose graph.