Table of Contents
Fetching ...

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

Zeyu Ren, Xiang Li, Yiran Wang, Zeyu Zhang, Hao Tang

TL;DR

StereoAdapter-2 is proposed, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models that employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity.

Abstract

Stereo depth estimation is fundamental to underwater robotic perception, yet suffers from severe domain shifts caused by wavelength-dependent light attenuation, scattering, and refraction. Recent approaches leverage monocular foundation models with GRU-based iterative refinement for underwater adaptation; however, the sequential gating and local convolutional kernels in GRUs necessitate multiple iterations for long-range disparity propagation, limiting performance in large-disparity and textureless underwater regions. In this paper, we propose StereoAdapter-2, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models. The proposed operator employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity. Furthermore, we construct UW-StereoDepth-80K, a large-scale synthetic underwater stereo dataset featuring diverse baselines, attenuation coefficients, and scattering parameters through a two-stage generative pipeline combining semantic-aware style transfer and geometry-consistent novel view synthesis. Combined with dynamic LoRA adaptation inherited from StereoAdapter, our framework achieves state-of-the-art zero-shot performance on underwater benchmarks with 17% improvement on TartanAir-UW and 7.2% improvment on SQUID, with real-world validation on the BlueROV2 platform demonstrates the robustness of our approach. Code: https://github.com/AIGeeksGroup/StereoAdapter-2. Website: https://aigeeksgroup.github.io/StereoAdapter-2.

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

TL;DR

StereoAdapter-2 is proposed, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models that employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity.

Abstract

Stereo depth estimation is fundamental to underwater robotic perception, yet suffers from severe domain shifts caused by wavelength-dependent light attenuation, scattering, and refraction. Recent approaches leverage monocular foundation models with GRU-based iterative refinement for underwater adaptation; however, the sequential gating and local convolutional kernels in GRUs necessitate multiple iterations for long-range disparity propagation, limiting performance in large-disparity and textureless underwater regions. In this paper, we propose StereoAdapter-2, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models. The proposed operator employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity. Furthermore, we construct UW-StereoDepth-80K, a large-scale synthetic underwater stereo dataset featuring diverse baselines, attenuation coefficients, and scattering parameters through a two-stage generative pipeline combining semantic-aware style transfer and geometry-consistent novel view synthesis. Combined with dynamic LoRA adaptation inherited from StereoAdapter, our framework achieves state-of-the-art zero-shot performance on underwater benchmarks with 17% improvement on TartanAir-UW and 7.2% improvment on SQUID, with real-world validation on the BlueROV2 platform demonstrates the robustness of our approach. Code: https://github.com/AIGeeksGroup/StereoAdapter-2. Website: https://aigeeksgroup.github.io/StereoAdapter-2.
Paper Structure (34 sections, 6 equations, 9 figures, 8 tables)

This paper contains 34 sections, 6 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Conceptual comparison. The Gated Recurrent Unit (GRU) relies on multiple non-linear gates and candidate states $\tilde{h}_t$ to update the hidden state ${h}_t$. Its complex gating mechanism introduces non-linear recursion that is difficult to analyze for long sequences. The Selective SSM streamlines this into a linear recurrence. By dynamically generating parameters from the input $x_t$, the Selective SSM maintains "input-dependent selectivity" to adaptively modulate information flow. We leveraged the characteristics of selective SSM to design ConvSS2D, enabling the adaptation iterative process.
  • Figure 2: Detailed architecture of the StereoAdapter-2: Our model iteratively refines disparity by integrating a Mamba Adapter. The refinement step is powered by the ConvSS2D operator, which enables adaptive and long-range spatial information propagation through multi-directional selective scanning.
  • Figure 3: Data synthesis pipeline. Semantic-aware style transfer and geometry-consistent novel view synthesis rendering pipeline for UW-StereoDepth-80K dataset.
  • Figure 4: Qualitative results of zero-shot stereo depth estimation
  • Figure 5: Qualitative results of zero-shot underwater stereo depth estimation were obtained by deploying the model on a robotic platform.
  • ...and 4 more figures