Table of Contents
Fetching ...

Gaussian Mixture based Evidential Learning for Stereo Matching

Weide Liu, Xingxing Wang, Lu Wang, Jun Cheng, Fayao Liu, Xulei Yang

TL;DR

This work addresses uncertainty estimation in stereo matching by replacing the single-Gaussian assumption with a Gaussian mixture within an evidential learning framework. By introducing per-component Normal Inverse Gamma priors and a mixture-based predictive distribution (a mixture of Student-t for the marginal likelihood), the method jointly yields sharp depth predictions and rich epistemic/aleatoric uncertainty estimates. The approach, built on a STTR backbone, demonstrates state-of-the-art performance on Scene Flow and strong cross-domain generalization to KITTI and Middlebury, with a controllable number of mixture components and modest computational overhead. The results underscore the practical potential of multimodal, uncertainty-aware stereo depth estimation for robust real-world deployment.

Abstract

In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and more accurately mirrors the real-world image distribution. By further employing the inverse-Gamma distribution as an intermediary prior for each mixture component, our probabilistic model achieves improved depth estimation compared to its counterpart with the single Gaussian and effectively captures the model uncertainty, which enables a strong cross-domain generation ability. We evaluated our method for stereo matching by training the model using the Scene Flow dataset and testing it on KITTI 2015 and Middlebury 2014. The experiment results consistently show that our method brings improvements over the baseline methods in a trustworthy manner. Notably, our approach achieved new state-of-the-art results on both the in-domain validated data and the cross-domain datasets, demonstrating its effectiveness and robustness in stereo matching tasks.

Gaussian Mixture based Evidential Learning for Stereo Matching

TL;DR

This work addresses uncertainty estimation in stereo matching by replacing the single-Gaussian assumption with a Gaussian mixture within an evidential learning framework. By introducing per-component Normal Inverse Gamma priors and a mixture-based predictive distribution (a mixture of Student-t for the marginal likelihood), the method jointly yields sharp depth predictions and rich epistemic/aleatoric uncertainty estimates. The approach, built on a STTR backbone, demonstrates state-of-the-art performance on Scene Flow and strong cross-domain generalization to KITTI and Middlebury, with a controllable number of mixture components and modest computational overhead. The results underscore the practical potential of multimodal, uncertainty-aware stereo depth estimation for robust real-world deployment.

Abstract

In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and more accurately mirrors the real-world image distribution. By further employing the inverse-Gamma distribution as an intermediary prior for each mixture component, our probabilistic model achieves improved depth estimation compared to its counterpart with the single Gaussian and effectively captures the model uncertainty, which enables a strong cross-domain generation ability. We evaluated our method for stereo matching by training the model using the Scene Flow dataset and testing it on KITTI 2015 and Middlebury 2014. The experiment results consistently show that our method brings improvements over the baseline methods in a trustworthy manner. Notably, our approach achieved new state-of-the-art results on both the in-domain validated data and the cross-domain datasets, demonstrating its effectiveness and robustness in stereo matching tasks.
Paper Structure (21 sections, 28 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 28 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of the proposed pipeline for mixture-of-Gaussian based Evidential Learning. Given a set of images as input, our objective is to train a network to estimate the parameters of a mixture-of-Gaussian evidential distribution. Instead of directly predicting depth estimation maps, our approach concurrently predicts Epistemic uncertainties to estimate the evidential distributions. The objective is attained by hierarchically modeling the targets using a Gaussian mixture likelihood, where each component is characterized by individual likelihood parameters denoted as $(\mu, \sigma_k^2)$. Additionally, a higher-order distribution of Normal Inverse Gamma is applied over each set of the unknowns $(\mu, \sigma_k^2)$.
  • Figure 2: Given a dataset, our proposed mixture-of-Gaussian evidential estimation method selects the most suitable Gaussian distribution. The component distribution with a lower variance indicates concentrated data points. Conversely, datasets with diverse samples, resulting in higher variance (e.g., long-tailed distribution) can be captured by components with a higher variance. Benefiting from the mixture-of-Gaussian approach, we are able to model flexibly more complex distributed targets. The shading intensity corresponds to the probability mass, with darker shading indicating higher probability. Our objective is to train a model predicting the target $y$ from an input $x$ while incorporating an evidential prior on the likelihood to facilitate uncertainty estimation.
  • Figure 3: Depth Estimation and Epistemic Uncertainty. This figure includes (a) Input images; (b) Pixel-wise depth predictions; (c) Error maps; (d) Epistemic uncertainty maps.
  • Figure 4: Effect of Adversarial Perturbation on Predictions and Uncertainty over MOG EL and EL frameworks. (a) the left input images, (b,c) predictions of EL and MOG EL, (d,e) error maps of EL and MOG EL, and (f,g) uncertainties of EL and MOG EL.
  • Figure 5: Illustrations of the components of the Gaussian mixture of images.