Pseudo-Stereo Inputs: A Solution to the Occlusion Challenge in Self-Supervised Stereo Matching
Ruizhi Yang, Xingqiang Li, Jiajun Bai, Jinsong Du
TL;DR
This work introduces a pseudo-stereo inputs framework to address the occlusion problem in self-supervised stereo matching by decoupling inputs from feedback. A pseudo-image rendering process generates a second-view input, while the original pair provides reliable feedback, enabling symmetric learning across occluders and avoiding persistent directional errors. Through occlusion-aware loss design, gradient detaching, and a fully pseudo-stereo input variant, the method achieves strong improvements across KITTI, SceneFlow, and Spring datasets with multiple backbones, and ablations confirm the necessity of each component. The approach substantially advances self-supervised 3D perception by resolving a fundamental bottleneck, with potential future gains from multi-frame integration for textureless and reflective regions.
Abstract
Self-supervised stereo matching holds great promise by eliminating the reliance on expensive ground-truth data. Its dominant paradigm, based on photometric consistency, is however fundamentally hindered by the occlusion challenge -- an issue that persists regardless of network architecture. The essential insight is that for any occluders, valid feedback signals can only be derived from the unoccluded areas on one side of the occluder. Existing methods attempt to address this by focusing on the erroneous feedback from the other side, either by identifying and removing it, or by introducing additional regularities for correction on that basis. Nevertheless, these approaches have failed to provide a complete solution. This work proposes a more fundamental solution. The core idea is to transform the fixed state of one-sided valid and one-sided erroneous signals into a probabilistic acquisition of valid feedback from both sides of an occluder. This is achieved through a complete framework, centered on a pseudo-stereo inputs strategy that decouples the input and feedback, without introducing any additional constraints. Qualitative results visually demonstrate that the occlusion problem is resolved, manifested by fully symmetrical and identical performance on both flanks of occluding objects. Quantitative experiments thoroughly validate the significant performance improvements resulting from solving the occlusion challenge.
