StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

Zeyu Ren; Xiang Li; Yiran Wang; Zeyu Zhang; Hao Tang

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

Zeyu Ren, Xiang Li, Yiran Wang, Zeyu Zhang, Hao Tang

TL;DR

StereoAdapter-2 is proposed, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models that employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity.

Abstract

Stereo depth estimation is fundamental to underwater robotic perception, yet suffers from severe domain shifts caused by wavelength-dependent light attenuation, scattering, and refraction. Recent approaches leverage monocular foundation models with GRU-based iterative refinement for underwater adaptation; however, the sequential gating and local convolutional kernels in GRUs necessitate multiple iterations for long-range disparity propagation, limiting performance in large-disparity and textureless underwater regions. In this paper, we propose StereoAdapter-2, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models. The proposed operator employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity. Furthermore, we construct UW-StereoDepth-80K, a large-scale synthetic underwater stereo dataset featuring diverse baselines, attenuation coefficients, and scattering parameters through a two-stage generative pipeline combining semantic-aware style transfer and geometry-consistent novel view synthesis. Combined with dynamic LoRA adaptation inherited from StereoAdapter, our framework achieves state-of-the-art zero-shot performance on underwater benchmarks with 17% improvement on TartanAir-UW and 7.2% improvment on SQUID, with real-world validation on the BlueROV2 platform demonstrates the robustness of our approach. Code: https://github.com/AIGeeksGroup/StereoAdapter-2. Website: https://aigeeksgroup.github.io/StereoAdapter-2.

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

TL;DR

Abstract

Paper Structure (34 sections, 6 equations, 9 figures, 8 tables)

This paper contains 34 sections, 6 equations, 9 figures, 8 tables.

Introduction
Related Work
Deep Stereo Matching
Underwater depth estimation and datasets
State Space Model
Preliminaries
The Proposed Method
Overview
Feature Extraction
Correlation Pyramids Building
Iterative Disparity Estimation
Input-dependent Selectivity
Scanning Strategy
Data Synthesis: UW-StereoDepth-80K
Underwater Style Transfer
...and 19 more sections

Figures (9)

Figure 1: Conceptual comparison. The Gated Recurrent Unit (GRU) relies on multiple non-linear gates and candidate states $\tilde{h}_t$ to update the hidden state ${h}_t$. Its complex gating mechanism introduces non-linear recursion that is difficult to analyze for long sequences. The Selective SSM streamlines this into a linear recurrence. By dynamically generating parameters from the input $x_t$, the Selective SSM maintains "input-dependent selectivity" to adaptively modulate information flow. We leveraged the characteristics of selective SSM to design ConvSS2D, enabling the adaptation iterative process.
Figure 2: Detailed architecture of the StereoAdapter-2: Our model iteratively refines disparity by integrating a Mamba Adapter. The refinement step is powered by the ConvSS2D operator, which enables adaptive and long-range spatial information propagation through multi-directional selective scanning.
Figure 3: Data synthesis pipeline. Semantic-aware style transfer and geometry-consistent novel view synthesis rendering pipeline for UW-StereoDepth-80K dataset.
Figure 4: Qualitative results of zero-shot stereo depth estimation
Figure 5: Qualitative results of zero-shot underwater stereo depth estimation were obtained by deploying the model on a robotic platform.
...and 4 more figures

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

TL;DR

Abstract

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)