Table of Contents
Fetching ...

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Xianqi Wang, Gangwei Xu, Hao Jia, Xin Yang

TL;DR

The Selective Recurrent Unit is proposed, a novel iterative update operator for stereo matching that empowers the network to aggregate hidden disparity information across multiple frequencies, mitigating the risk of vital hidden disparity information loss during iterative processes.

Abstract

Stereo matching methods based on iterative optimization, like RAFT-Stereo and IGEV-Stereo, have evolved into a cornerstone in the field of stereo matching. However, these methods struggle to simultaneously capture high-frequency information in edges and low-frequency information in smooth regions due to the fixed receptive field. As a result, they tend to lose details, blur edges, and produce false matches in textureless areas. In this paper, we propose Selective Recurrent Unit (SRU), a novel iterative update operator for stereo matching. The SRU module can adaptively fuse hidden disparity information at multiple frequencies for edge and smooth regions. To perform adaptive fusion, we introduce a new Contextual Spatial Attention (CSA) module to generate attention maps as fusion weights. The SRU empowers the network to aggregate hidden disparity information across multiple frequencies, mitigating the risk of vital hidden disparity information loss during iterative processes. To verify SRU's universality, we apply it to representative iterative stereo matching methods, collectively referred to as Selective-Stereo. Our Selective-Stereo ranks $1^{st}$ on KITTI 2012, KITTI 2015, ETH3D, and Middlebury leaderboards among all published methods. Code is available at https://github.com/Windsrain/Selective-Stereo.

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

TL;DR

The Selective Recurrent Unit is proposed, a novel iterative update operator for stereo matching that empowers the network to aggregate hidden disparity information across multiple frequencies, mitigating the risk of vital hidden disparity information loss during iterative processes.

Abstract

Stereo matching methods based on iterative optimization, like RAFT-Stereo and IGEV-Stereo, have evolved into a cornerstone in the field of stereo matching. However, these methods struggle to simultaneously capture high-frequency information in edges and low-frequency information in smooth regions due to the fixed receptive field. As a result, they tend to lose details, blur edges, and produce false matches in textureless areas. In this paper, we propose Selective Recurrent Unit (SRU), a novel iterative update operator for stereo matching. The SRU module can adaptively fuse hidden disparity information at multiple frequencies for edge and smooth regions. To perform adaptive fusion, we introduce a new Contextual Spatial Attention (CSA) module to generate attention maps as fusion weights. The SRU empowers the network to aggregate hidden disparity information across multiple frequencies, mitigating the risk of vital hidden disparity information loss during iterative processes. To verify SRU's universality, we apply it to representative iterative stereo matching methods, collectively referred to as Selective-Stereo. Our Selective-Stereo ranks on KITTI 2012, KITTI 2015, ETH3D, and Middlebury leaderboards among all published methods. Code is available at https://github.com/Windsrain/Selective-Stereo.
Paper Structure (14 sections, 6 equations, 7 figures, 9 tables)

This paper contains 14 sections, 6 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Row 1: Comparisons with state-of-the-art stereo methods on KITTI 2012 geiger2012we and KITTI 2015 menze2015object, ETH3D schops2017multi and Middlebury scharstein2014high leaderboards. Row 2: Visual comparison with RAFT-Stereo on ETH3D. Row 3: Visual comparison with IGEV-Stereo on Middlebury. Our method distinguishes subtle details and sharp edges and performs well in weak texture regions.
  • Figure 2: Overview of our proposed Selective-Stereo (Selective-RAFT version). The Contextual Spatial Attention (CSA) module extracts attention maps from context information as a guide for Selective Recurrent Units (SRUs). Then the network iteratively updates the disparity using local cost volumes retrieved from the correlation pyramid and attention maps given by CSA through SRUs.
  • Figure 3: The architecture of proposed modules. Left: Contextual Spatial Attention (CSA) module. Right: Selective Recurrent Unit (SRU).
  • Figure 4: Multi-level SRU. Information is passed between SRUs at adjacent resolutions. Dashed arrows represent upsampling and downsampling operations. At $1/4$ resolution, disparity and local cost volume will be additional information put into SRUs.
  • Figure 5: Qualitative results on the test set of KITTI. Our Selective-IGEV outperforms IGEV in detailed and weak texture regions.
  • ...and 2 more figures