MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting
Yoli Shavit, Jacob Goldberger
TL;DR
MoGU addresses the lack of explicit uncertainty in standard Mixture-of-Experts by making each expert output a Gaussian (mean and variance) and gating the ensemble using expert uncertainty. By replacing the traditional gating network with inverse-variance-based weights and a Gaussian mixture likelihood, MoGU yields calibrated predictive distributions with mean $\hat{y}$ and variance $Var(y|x)$, proven effective for time series forecasting across varied architectures and horizons. Across eight benchmarks, MoGU improves forecasting accuracy over single-expert and conventional MoE baselines, while providing uncertainty estimates that positively correlate with prediction errors, particularly for aleatoric uncertainty. This approach enhances reliability and interpretability in regression tasks and offers a pathway to integrating probabilistic gating into broader MoE and sparse-latent architectures.
Abstract
We introduce Mixture-of-Gaussians with Uncertainty-based Gating (MoGU), a novel Mixture-of-Experts (MoE) framework designed for regression tasks and applied to time series forecasting. Unlike conventional MoEs that provide only point estimates, MoGU models each expert's output as a Gaussian distribution. This allows it to directly quantify both the forecast (the mean) and its inherent uncertainty (variance). MoGU's core innovation is its uncertainty-based gating mechanism, which replaces the traditional input-based gating network by using each expert's estimated variance to determine its contribution to the final prediction. Evaluated across diverse time series forecasting benchmarks, MoGU consistently outperforms single-expert models and traditional MoE setups. It also provides well-quantified, informative uncertainties that directly correlate with prediction errors, enhancing forecast reliability. Our code is available from: https://github.com/yolish/moe_unc_tsf
