Table of Contents
Fetching ...

Pseudo-Stereo Inputs: A Solution to the Occlusion Challenge in Self-Supervised Stereo Matching

Ruizhi Yang, Xingqiang Li, Jiajun Bai, Jinsong Du

TL;DR

This work introduces a pseudo-stereo inputs framework to address the occlusion problem in self-supervised stereo matching by decoupling inputs from feedback. A pseudo-image rendering process generates a second-view input, while the original pair provides reliable feedback, enabling symmetric learning across occluders and avoiding persistent directional errors. Through occlusion-aware loss design, gradient detaching, and a fully pseudo-stereo input variant, the method achieves strong improvements across KITTI, SceneFlow, and Spring datasets with multiple backbones, and ablations confirm the necessity of each component. The approach substantially advances self-supervised 3D perception by resolving a fundamental bottleneck, with potential future gains from multi-frame integration for textureless and reflective regions.

Abstract

Self-supervised stereo matching holds great promise by eliminating the reliance on expensive ground-truth data. Its dominant paradigm, based on photometric consistency, is however fundamentally hindered by the occlusion challenge -- an issue that persists regardless of network architecture. The essential insight is that for any occluders, valid feedback signals can only be derived from the unoccluded areas on one side of the occluder. Existing methods attempt to address this by focusing on the erroneous feedback from the other side, either by identifying and removing it, or by introducing additional regularities for correction on that basis. Nevertheless, these approaches have failed to provide a complete solution. This work proposes a more fundamental solution. The core idea is to transform the fixed state of one-sided valid and one-sided erroneous signals into a probabilistic acquisition of valid feedback from both sides of an occluder. This is achieved through a complete framework, centered on a pseudo-stereo inputs strategy that decouples the input and feedback, without introducing any additional constraints. Qualitative results visually demonstrate that the occlusion problem is resolved, manifested by fully symmetrical and identical performance on both flanks of occluding objects. Quantitative experiments thoroughly validate the significant performance improvements resulting from solving the occlusion challenge.

Pseudo-Stereo Inputs: A Solution to the Occlusion Challenge in Self-Supervised Stereo Matching

TL;DR

This work introduces a pseudo-stereo inputs framework to address the occlusion problem in self-supervised stereo matching by decoupling inputs from feedback. A pseudo-image rendering process generates a second-view input, while the original pair provides reliable feedback, enabling symmetric learning across occluders and avoiding persistent directional errors. Through occlusion-aware loss design, gradient detaching, and a fully pseudo-stereo input variant, the method achieves strong improvements across KITTI, SceneFlow, and Spring datasets with multiple backbones, and ablations confirm the necessity of each component. The approach substantially advances self-supervised 3D perception by resolving a fundamental bottleneck, with potential future gains from multi-frame integration for textureless and reflective regions.

Abstract

Self-supervised stereo matching holds great promise by eliminating the reliance on expensive ground-truth data. Its dominant paradigm, based on photometric consistency, is however fundamentally hindered by the occlusion challenge -- an issue that persists regardless of network architecture. The essential insight is that for any occluders, valid feedback signals can only be derived from the unoccluded areas on one side of the occluder. Existing methods attempt to address this by focusing on the erroneous feedback from the other side, either by identifying and removing it, or by introducing additional regularities for correction on that basis. Nevertheless, these approaches have failed to provide a complete solution. This work proposes a more fundamental solution. The core idea is to transform the fixed state of one-sided valid and one-sided erroneous signals into a probabilistic acquisition of valid feedback from both sides of an occluder. This is achieved through a complete framework, centered on a pseudo-stereo inputs strategy that decouples the input and feedback, without introducing any additional constraints. Qualitative results visually demonstrate that the occlusion problem is resolved, manifested by fully symmetrical and identical performance on both flanks of occluding objects. Quantitative experiments thoroughly validate the significant performance improvements resulting from solving the occlusion challenge.
Paper Structure (17 sections, 5 equations, 8 figures, 3 tables)

This paper contains 17 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An illustration of our motivation. (Top) The fixed-side occlusion problem where standard flipping is ineffective, as the occlusion side remains constant relative to the inputs. (Bottom) Our method decouples inputs and feedback using a pseudo-view (C), breaking the persistent directional error signal.
  • Figure 2: An overview of our proposed pipeline. We leverage a pseudo-input to decouple network inputs and feedback. This enables a symmetric training paradigm where occlusions are probabilistically relocated, providing valid feedback for both sides of occluders. An occlusion estimation and gradient detaching mechanism is employed to resolve the inherent feedback conflicts.
  • Figure 3: A comparison between the existing self-supervised pipeline and our pseudo-stereo inputs strategy. In contrast to existing methods, our method decouples the input stereo images from the feedback stereo images. This allows for flexible feedback using image pairs with different occlusion positions.
  • Figure 4: Schematic diagram of the pseudo-images generation using the rendering method, where Backbone is the model currently under training rather than a previously well-trained one. The subscript "Flip" denotes horizontal flipping.
  • Figure 5: Visual demonstration of the pseudo-image generation process. Issues with small-scale pixel loss and large-scale pixel missing at edges are mitigated using padding and cropping.
  • ...and 3 more figures