Table of Contents
Fetching ...

MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting

Yoli Shavit, Jacob Goldberger

TL;DR

MoGU addresses the lack of explicit uncertainty in standard Mixture-of-Experts by making each expert output a Gaussian (mean and variance) and gating the ensemble using expert uncertainty. By replacing the traditional gating network with inverse-variance-based weights and a Gaussian mixture likelihood, MoGU yields calibrated predictive distributions with mean $\hat{y}$ and variance $Var(y|x)$, proven effective for time series forecasting across varied architectures and horizons. Across eight benchmarks, MoGU improves forecasting accuracy over single-expert and conventional MoE baselines, while providing uncertainty estimates that positively correlate with prediction errors, particularly for aleatoric uncertainty. This approach enhances reliability and interpretability in regression tasks and offers a pathway to integrating probabilistic gating into broader MoE and sparse-latent architectures.

Abstract

We introduce Mixture-of-Gaussians with Uncertainty-based Gating (MoGU), a novel Mixture-of-Experts (MoE) framework designed for regression tasks and applied to time series forecasting. Unlike conventional MoEs that provide only point estimates, MoGU models each expert's output as a Gaussian distribution. This allows it to directly quantify both the forecast (the mean) and its inherent uncertainty (variance). MoGU's core innovation is its uncertainty-based gating mechanism, which replaces the traditional input-based gating network by using each expert's estimated variance to determine its contribution to the final prediction. Evaluated across diverse time series forecasting benchmarks, MoGU consistently outperforms single-expert models and traditional MoE setups. It also provides well-quantified, informative uncertainties that directly correlate with prediction errors, enhancing forecast reliability. Our code is available from: https://github.com/yolish/moe_unc_tsf

MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting

TL;DR

MoGU addresses the lack of explicit uncertainty in standard Mixture-of-Experts by making each expert output a Gaussian (mean and variance) and gating the ensemble using expert uncertainty. By replacing the traditional gating network with inverse-variance-based weights and a Gaussian mixture likelihood, MoGU yields calibrated predictive distributions with mean and variance , proven effective for time series forecasting across varied architectures and horizons. Across eight benchmarks, MoGU improves forecasting accuracy over single-expert and conventional MoE baselines, while providing uncertainty estimates that positively correlate with prediction errors, particularly for aleatoric uncertainty. This approach enhances reliability and interpretability in regression tasks and offers a pathway to integrating probabilistic gating into broader MoE and sparse-latent architectures.

Abstract

We introduce Mixture-of-Gaussians with Uncertainty-based Gating (MoGU), a novel Mixture-of-Experts (MoE) framework designed for regression tasks and applied to time series forecasting. Unlike conventional MoEs that provide only point estimates, MoGU models each expert's output as a Gaussian distribution. This allows it to directly quantify both the forecast (the mean) and its inherent uncertainty (variance). MoGU's core innovation is its uncertainty-based gating mechanism, which replaces the traditional input-based gating network by using each expert's estimated variance to determine its contribution to the final prediction. Evaluated across diverse time series forecasting benchmarks, MoGU consistently outperforms single-expert models and traditional MoE setups. It also provides well-quantified, informative uncertainties that directly correlate with prediction errors, enhancing forecast reliability. Our code is available from: https://github.com/yolish/moe_unc_tsf

Paper Structure

This paper contains 19 sections, 16 equations, 2 figures, 8 tables, 1 algorithm.

Figures (2)

  • Figure 1: Example forecasts along with the ground truth, the MAE and uncertainty reported by MoGU with three experts. The forecasts for the Etth1 dataset (a) were generated using PatchTST as the expert architecture, while those for Ettm1 (b) were generated using iTransformer.
  • Figure 2: Heatmaps of the Pearson correlation between MoGU's reported uncertainties (aleatoric, epistemic, and total) and the MAE of its predictions. The correlation is displayed per variable for the ETT datasets.