Table of Contents
Fetching ...

Divergent Ensemble Networks: Enhancing Uncertainty Estimation with Shared Representations and Independent Branching

Arnav Kharbanda, Advait Chandorkar

TL;DR

The paper addresses reliable uncertainty estimation in neural networks by reducing parameter redundancy through a shared-to-branching architecture called Divergent Ensemble Network (DEN). DEN preserves ensemble diversity by training multiple independent branches on top of a shared representation, achieving faster inference while retaining robust uncertainty estimates. Experiments on MNIST, NotMNIST, and a toy regression task show that DEN matches or surpasses baselines in accuracy and uncertainty calibration, with substantially lower inference time, including better OoD handling. The work suggests DEN as a scalable approach for real-time, uncertainty-aware decision making in domains like robotics and medical imaging, and provides public code to facilitate adoption.

Abstract

Ensemble learning has proven effective in improving predictive performance and estimating uncertainty in neural networks. However, conventional ensemble methods often suffer from redundant parameter usage and computational inefficiencies due to entirely independent network training. To address these challenges, we propose the Divergent Ensemble Network (DEN), a novel architecture that combines shared representation learning with independent branching. DEN employs a shared input layer to capture common features across all branches, followed by divergent, independently trainable layers that form an ensemble. This shared-to-branching structure reduces parameter redundancy while maintaining ensemble diversity, enabling efficient and scalable learning.

Divergent Ensemble Networks: Enhancing Uncertainty Estimation with Shared Representations and Independent Branching

TL;DR

The paper addresses reliable uncertainty estimation in neural networks by reducing parameter redundancy through a shared-to-branching architecture called Divergent Ensemble Network (DEN). DEN preserves ensemble diversity by training multiple independent branches on top of a shared representation, achieving faster inference while retaining robust uncertainty estimates. Experiments on MNIST, NotMNIST, and a toy regression task show that DEN matches or surpasses baselines in accuracy and uncertainty calibration, with substantially lower inference time, including better OoD handling. The work suggests DEN as a scalable approach for real-time, uncertainty-aware decision making in domains like robotics and medical imaging, and provides public code to facilitate adoption.

Abstract

Ensemble learning has proven effective in improving predictive performance and estimating uncertainty in neural networks. However, conventional ensemble methods often suffer from redundant parameter usage and computational inefficiencies due to entirely independent network training. To address these challenges, we propose the Divergent Ensemble Network (DEN), a novel architecture that combines shared representation learning with independent branching. DEN employs a shared input layer to capture common features across all branches, followed by divergent, independently trainable layers that form an ensemble. This shared-to-branching structure reduces parameter redundancy while maintaining ensemble diversity, enabling efficient and scalable learning.

Paper Structure

This paper contains 18 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Visual representation of the proposed neural network architecture. The number of branches and the nodes in each branch are hyper-parameters and varies from problem to problem.
  • Figure 2: Comparison of the MNIST and NotMNIST Datasets.
  • Figure 3: Classification performance metrics: (a) Model Accuracy Comparison between Ensemble and Single Model Approaches, (b) Average Inference Time per Model, (c) Performance Metrics: Accuracy and Inference Time for Different Models
  • Figure 4: Regression performance metrics: (a) Mean Squared Error (MSE), (b) Mean Absolute Error (MAE), (c) R² Score, and (d) Average Inference Time.