Table of Contents
Fetching ...

RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo

Jueun Ko, Hyewon Park, Hyesong Choi, Dongbo Min

TL;DR

RobIA tackles stereo depth estimation under continual domain shifts by coupling an instance-aware, parameter-efficient Attend-and-Excite Mixture-of-Experts (AttEx-MoE) with a Robust AdaptBN Teacher for dense, dual-source pseudo-supervision. AttEx-MoE enables input-specific adaptation while keeping the backbone frozen, and the AdaptBN Teacher provides supervision in low-confidence regions, mitigating pseudo-label sparsity. The approach achieves superior CTTA performance across dynamic target domains with favorable efficiency, demonstrating strong generalization and robustness in real-world deployment. This work advances CTTA for dense prediction by integrating instance-aware modulation and hybrid supervision, with practical implications for robust stereo-based perception in robotics and autonomous systems.

Abstract

Stereo Depth Estimation in real-world environments poses significant challenges due to dynamic domain shifts, sparse or unreliable supervision, and the high cost of acquiring dense ground-truth labels. While recent Test-Time Adaptation (TTA) methods offer promising solutions, most rely on static target domain assumptions and input-invariant adaptation strategies, limiting their effectiveness under continual shifts. In this paper, we propose RobIA, a novel Robust, Instance-Aware framework for Continual Test-Time Adaptation (CTTA) in stereo depth estimation. RobIA integrates two key components: (1) Attend-and-Excite Mixture-of-Experts (AttEx-MoE), a parameter-efficient module that dynamically routes input to frozen experts via lightweight self-attention mechanism tailored to epipolar geometry, and (2) Robust AdaptBN Teacher, a PEFT-based teacher model that provides dense pseudo-supervision by complementing sparse handcrafted labels. This strategy enables input-specific flexibility, broad supervision coverage, improving generalization under domain shift. Extensive experiments demonstrate that RobIA achieves superior adaptation performance across dynamic target domains while maintaining computational efficiency.

RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo

TL;DR

RobIA tackles stereo depth estimation under continual domain shifts by coupling an instance-aware, parameter-efficient Attend-and-Excite Mixture-of-Experts (AttEx-MoE) with a Robust AdaptBN Teacher for dense, dual-source pseudo-supervision. AttEx-MoE enables input-specific adaptation while keeping the backbone frozen, and the AdaptBN Teacher provides supervision in low-confidence regions, mitigating pseudo-label sparsity. The approach achieves superior CTTA performance across dynamic target domains with favorable efficiency, demonstrating strong generalization and robustness in real-world deployment. This work advances CTTA for dense prediction by integrating instance-aware modulation and hybrid supervision, with practical implications for robust stereo-based perception in robotics and autonomous systems.

Abstract

Stereo Depth Estimation in real-world environments poses significant challenges due to dynamic domain shifts, sparse or unreliable supervision, and the high cost of acquiring dense ground-truth labels. While recent Test-Time Adaptation (TTA) methods offer promising solutions, most rely on static target domain assumptions and input-invariant adaptation strategies, limiting their effectiveness under continual shifts. In this paper, we propose RobIA, a novel Robust, Instance-Aware framework for Continual Test-Time Adaptation (CTTA) in stereo depth estimation. RobIA integrates two key components: (1) Attend-and-Excite Mixture-of-Experts (AttEx-MoE), a parameter-efficient module that dynamically routes input to frozen experts via lightweight self-attention mechanism tailored to epipolar geometry, and (2) Robust AdaptBN Teacher, a PEFT-based teacher model that provides dense pseudo-supervision by complementing sparse handcrafted labels. This strategy enables input-specific flexibility, broad supervision coverage, improving generalization under domain shift. Extensive experiments demonstrate that RobIA achieves superior adaptation performance across dynamic target domains while maintaining computational efficiency.

Paper Structure

This paper contains 24 sections, 6 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: The Overview of RobIA. During test time, the student model is trained using dense pseudo-labels generated by combining sparse handcrafted proxy $D_\text{proxy}$ with Robust Teacher prediction $D_\text{teacher}$, ensuring stable adaptation under dynamic conditions. AttEx-MoE integrates a row-wise self-attention router and gating network $G$ into deep encoder blocks. The row-wise self-attention router extracts global context from an input feature map $z$, which is subsequently processed by a gating network. The student backbone is kept frozen, and only the router, gating network, and the regression parameters of the decoder are updated.
  • Figure 2: D1-all error rate in different pseudo-label regions. D1-all error rate over 10 adaptation rounds in different pseudo-label regions. We separate evaluation into (left) the entire image, (middle) regions with valid handcrafted pseudo-labels, and (right) regions without reliable supervision (invalid).
  • Figure 3: Pseudo-labels (top row) and predictions (bottom row) after ten adaptation rounds. We visualize the sparse handcrafted pseudo-label (b) and the dense pseudo-label using the AdaptBN teacher (c), and the student predictions of AttEx-MoE trained with sparse (d) and dense (e) supervision. (a) shows the input left image.
  • Figure 4: Qualitative results for cloudy sequences in the DrivingStereo dataset.
  • Figure 5: Qualitative results for rainy sequences in the DrivingStereo dataset.