Table of Contents
Fetching ...

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Rui Gong, Weide Liu, Zaiwang Gu, Xulei Yang, Jun Cheng

TL;DR

This work tackles stereo matching by injecting both intra-view and cross-view geometric knowledge into learning-based disparity estimation. It introduces ICGNet, which employs an intra-view decoder guided by a pre-trained interest-point detector and a cross-view decoder guided by a pre-trained interest-point matcher and ground-truth correspondences, optimized with L_intra, L_cross-soft, and L_cross-hard losses in addition to the standard disparity loss. Empirically, ICGNet achieves state-of-the-art performance on SceneFlow and strong cross-domain generalization to KITTI and Middlebury, while incurring zero inference overhead due to the decoders being discarded at test time. The approach demonstrates the value of leveraging geometric priors from local feature matching to improve disparity estimation, with broad implications for robust stereo in textureless or occluded regions.

Abstract

Geometric knowledge has been shown to be beneficial for the stereo matching task. However, prior attempts to integrate geometric insights into stereo matching algorithms have largely focused on geometric knowledge from single images while crucial cross-view factors such as occlusion and matching uniqueness have been overlooked. To address this gap, we propose a novel Intra-view and Cross-view Geometric knowledge learning Network (ICGNet), specifically crafted to assimilate both intra-view and cross-view geometric knowledge. ICGNet harnesses the power of interest points to serve as a channel for intra-view geometric understanding. Simultaneously, it employs the correspondences among these points to capture cross-view geometric relationships. This dual incorporation empowers the proposed ICGNet to leverage both intra-view and cross-view geometric knowledge in its learning process, substantially improving its ability to estimate disparities. Our extensive experiments demonstrate the superiority of the ICGNet over contemporary leading models.

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

TL;DR

This work tackles stereo matching by injecting both intra-view and cross-view geometric knowledge into learning-based disparity estimation. It introduces ICGNet, which employs an intra-view decoder guided by a pre-trained interest-point detector and a cross-view decoder guided by a pre-trained interest-point matcher and ground-truth correspondences, optimized with L_intra, L_cross-soft, and L_cross-hard losses in addition to the standard disparity loss. Empirically, ICGNet achieves state-of-the-art performance on SceneFlow and strong cross-domain generalization to KITTI and Middlebury, while incurring zero inference overhead due to the decoders being discarded at test time. The approach demonstrates the value of leveraging geometric priors from local feature matching to improve disparity estimation, with broad implications for robust stereo in textureless or occluded regions.

Abstract

Geometric knowledge has been shown to be beneficial for the stereo matching task. However, prior attempts to integrate geometric insights into stereo matching algorithms have largely focused on geometric knowledge from single images while crucial cross-view factors such as occlusion and matching uniqueness have been overlooked. To address this gap, we propose a novel Intra-view and Cross-view Geometric knowledge learning Network (ICGNet), specifically crafted to assimilate both intra-view and cross-view geometric knowledge. ICGNet harnesses the power of interest points to serve as a channel for intra-view geometric understanding. Simultaneously, it employs the correspondences among these points to capture cross-view geometric relationships. This dual incorporation empowers the proposed ICGNet to leverage both intra-view and cross-view geometric knowledge in its learning process, substantially improving its ability to estimate disparities. Our extensive experiments demonstrate the superiority of the ICGNet over contemporary leading models.
Paper Structure (15 sections, 4 equations, 4 figures, 7 tables)

This paper contains 15 sections, 4 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: An illustration of our overall framework using synthetic shapes. We leverage intra-view geometric knowledge which is the knowledge of extracting geometric structures, and cross-view geometric knowledge which is the knowledge of the correspondences of these structures, to aid the stereo matching task. Note that the dotted lines are used for illustration and are not used in the method.
  • Figure 2: Overall structure of our proposed framework. The model architecture comprises three parts: stereo matching network, cross-view knowledge learning network, and intra-view knowledge learning network. The cross-view knowledge learning network introduces cross-view geometric knowledge by aligning the interest point correspondences $\mathbf{P}'$, $\mathbf{P}$ and $\mathcal{P}^{gt}$ using $\mathcal{L}_{\text{cross-hard}}$ and $\mathcal{L}_{\text{cross-soft}}$. The intra-view knowledge learning network introduces intra-view geometric knowledge through aligning the interest point maps $\mathbf{M}'$ and $\mathbf{M}$ using $\mathcal{L}_{\text{intra}}$. Note that the dotted lines are used just for clear illustration and are not used in our work.
  • Figure 3: Qualitative results of ICGNet (ours) compared with the state-of-the-art method IGEV-Stereo.
  • Figure 4: Qualitative results of cross-domain generalization of ICGNet (ours) compared with state-of-the-art baseline IGEV-Stereo igevstereo.