Gaussian Mixture based Evidential Learning for Stereo Matching
Weide Liu, Xingxing Wang, Lu Wang, Jun Cheng, Fayao Liu, Xulei Yang
TL;DR
This work addresses uncertainty estimation in stereo matching by replacing the single-Gaussian assumption with a Gaussian mixture within an evidential learning framework. By introducing per-component Normal Inverse Gamma priors and a mixture-based predictive distribution (a mixture of Student-t for the marginal likelihood), the method jointly yields sharp depth predictions and rich epistemic/aleatoric uncertainty estimates. The approach, built on a STTR backbone, demonstrates state-of-the-art performance on Scene Flow and strong cross-domain generalization to KITTI and Middlebury, with a controllable number of mixture components and modest computational overhead. The results underscore the practical potential of multimodal, uncertainty-aware stereo depth estimation for robust real-world deployment.
Abstract
In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and more accurately mirrors the real-world image distribution. By further employing the inverse-Gamma distribution as an intermediary prior for each mixture component, our probabilistic model achieves improved depth estimation compared to its counterpart with the single Gaussian and effectively captures the model uncertainty, which enables a strong cross-domain generation ability. We evaluated our method for stereo matching by training the model using the Scene Flow dataset and testing it on KITTI 2015 and Middlebury 2014. The experiment results consistently show that our method brings improvements over the baseline methods in a trustworthy manner. Notably, our approach achieved new state-of-the-art results on both the in-domain validated data and the cross-domain datasets, demonstrating its effectiveness and robustness in stereo matching tasks.
