Temporally Consistent Stereo Matching
Jiaxi Zeng, Chengtang Yao, Yuwei Wu, Yunde Jia
TL;DR
This work tackles temporal inconsistency in video stereo matching by introducing TC-Stereo, which combines temporal disparity completion to provide a robust initialization with semi-dense priors, temporal state fusion to produce coherent hidden states, and a dual-space refinement that iterates in both disparity and disparity-gradient spaces. The method leverages a cost-volume based semi-dense map, a lightweight fusion module, and gradient-guided propagation to extend local surface constraints globally, improving performance in ill-posed regions. Extensive experiments across synthetic and real datasets show state-of-the-art temporal consistency and competitive accuracy with high efficiency, including online inference at frame rates suitable for practical applications. The approach offers robust performance in occlusions and reflections, with limitations in extreme dynamic scenes and pose errors, but demonstrates clear advantages for online, temporally coherent depth estimation in stereo video pipelines.
Abstract
Stereo matching provides depth estimation from binocular images for downstream applications. These applications mostly take video streams as input and require temporally consistent depth maps. However, existing methods mainly focus on the estimation at the single-frame level. This commonly leads to temporally inconsistent results, especially in ill-posed regions. In this paper, we aim to leverage temporal information to improve the temporal consistency, accuracy, and efficiency of stereo matching. To achieve this, we formulate video stereo matching as a process of temporal disparity completion followed by continuous iterative refinements. Specifically, we first project the disparity of the previous timestamp to the current viewpoint, obtaining a semi-dense disparity map. Then, we complete this map through a disparity completion module to obtain a well-initialized disparity map. The state features from the current completion module and from the past refinement are fused together, providing a temporally coherent state for subsequent refinement. Based on this coherent state, we introduce a dual-space refinement module to iteratively refine the initialized result in both disparity and disparity gradient spaces, improving estimations in ill-posed regions. Extensive experiments demonstrate that our method effectively alleviates temporal inconsistency while enhancing both accuracy and efficiency.
