Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency
Woonghyun Ka, Jae Young Lee, Jaehyun Choi, Junmo Kim
TL;DR
Self-supervised monocular depth estimation often suffers from errors in pseudo-depth generated by stereo-matching networks. This work proposes a GT-free filtering mechanism based on consistency across multiple disparity maps obtained via disparity plane sweep, producing a weight map to down-weight unreliable regions during training. The weight map modulates the depth regression loss, enabling the monocular network to learn from accurate pseudo-depth without additional GT or stereo-confidence training. Experiments on KITTI Eigen split and Cityscapes demonstrate improved accuracy and robustness across backbone and stereo-network configurations, with qualitative gains at object boundaries and in challenging regions.
Abstract
In stereo-matching knowledge distillation methods of the self-supervised monocular depth estimation, the stereo-matching network's knowledge is distilled into a monocular depth network through pseudo-depth maps. In these methods, the learning-based stereo-confidence network is generally utilized to identify errors in the pseudo-depth maps to prevent transferring the errors. However, the learning-based stereo-confidence networks should be trained with ground truth (GT), which is not feasible in a self-supervised setting. In this paper, we propose a method to identify and filter errors in the pseudo-depth map using multiple disparity maps by checking their consistency without the need for GT and a training process. Experimental results show that the proposed method outperforms the previous methods and works well on various configurations by filtering out erroneous areas where the stereo-matching is vulnerable, especially such as textureless regions, occlusion boundaries, and reflective surfaces.
