Bayesian Neural Networks vs. Mixture Density Networks: Theoretical and Empirical Insights for Uncertainty-Aware Nonlinear Modeling
Riddhi Pratim Ghosh, Ian Barnett
TL;DR
The paper addresses uncertainty quantification in nonlinear regression by comparing Bayesian Neural Networks (BNNs), which encode uncertainty through parameter posteriors, with Mixture Density Networks (MDNs), which model conditional densities directly. It develops a unified theoretical framework showing that MDNs achieve faster KL divergence convergence under Hölder smoothness, with an approximation error scaling as $K n^{-2s/d}$ and an estimation term, while BNNs incur an extra term from variational mismatches $\mathrm{KL}(q^*\|\pi)/N$. Empirically, MDNs better capture multimodality and adaptive uncertainty, especially in multimodal synthetic tasks, whereas BNNs offer interpretable epistemic uncertainty in data-scarce settings. Real-data analysis on the RSNA Pediatric Bone Age dataset demonstrates sharper, more calibrated predictive densities from MDNs, highlighting their practicality for uncertainty-aware medical decision support. The work clarifies the complementary roles of posterior-based and likelihood-based uncertainty modeling and guides principled model choice in nonlinear, uncertain environments.
Abstract
This paper investigates two prominent probabilistic neural modeling paradigms: Bayesian Neural Networks (BNNs) and Mixture Density Networks (MDNs) for uncertainty-aware nonlinear regression. While BNNs incorporate epistemic uncertainty by placing prior distributions over network parameters, MDNs directly model the conditional output distribution, thereby capturing multimodal and heteroscedastic data-generating mechanisms. We present a unified theoretical and empirical framework comparing these approaches. On the theoretical side, we derive convergence rates and error bounds under Hölder smoothness conditions, showing that MDNs achieve faster Kullback-Leibler (KL) divergence convergence due to their likelihood-based nature, whereas BNNs exhibit additional approximation bias induced by variational inference. Empirically, we evaluate both architectures on synthetic nonlinear datasets and a radiographic benchmark (RSNA Pediatric Bone Age Challenge). Quantitative and qualitative results demonstrate that MDNs more effectively capture multimodal responses and adaptive uncertainty, whereas BNNs provide more interpretable epistemic uncertainty under limited data. Our findings clarify the complementary strengths of posterior-based and likelihood-based probabilistic learning, offering guidance for uncertainty-aware modeling in nonlinear systems.
