Table of Contents
Fetching ...

Noisy Deep Ensemble: Accelerating Deep Ensemble Learning via Noise Injection

Shunsuke Sakai, Shunsuke Tsuge, Tatsuhito Hasegawa

TL;DR

The paper tackles the high training cost of neural network ensembles by proposing Noisy Deep Ensemble, where a parent model is trained to convergence and its weights are perturbed to create multiple diverse child models that are trained briefly and ensembled by averaging predictions. The approach uses weight perturbations with a mask and either Gaussian or uniform noise, controlled by an perturbation proportion $\alpha$ and scale $\beta$, to explore multiple local minima while keeping training time low. Empirical results on CIFAR-10/100 across CNNs show Noisy Deep Ensemble often matches or surpasses traditional ensembles and outperforms Snapshot Ensemble, with notable gains in diversity and robustness; training time can drop to about $35\%$ of standard ensemble time when using $M=10$. The work demonstrates practical benefits for scalable, accurate ensembles and provides a framework for future improvements via targeted perturbation strategies and calibration analysis.

Abstract

Neural network ensembles is a simple yet effective approach for enhancing generalization capabilities. The most common method involves independently training multiple neural networks initialized with different weights and then averaging their predictions during inference. However, this approach increases training time linearly with the number of ensemble members. To address this issue, we propose the novel ``\textbf{Noisy Deep Ensemble}'' method, significantly reducing the training time required for neural network ensembles. In this method, a \textit{parent model} is trained until convergence, and then the weights of the \textit{parent model} are perturbed in various ways to construct multiple \textit{child models}. This perturbation of the \textit{parent model} weights facilitates the exploration of different local minima while significantly reducing the training time for each ensemble member. We evaluated our method using diverse CNN architectures on CIFAR-10 and CIFAR-100 datasets, surpassing conventional efficient ensemble methods and achieving test accuracy comparable to standard ensembles. Code is available at \href{https://github.com/TSTB-dev/NoisyDeepEnsemble}{https://github.com/TSTB-dev/NoisyDeepEnsemble}

Noisy Deep Ensemble: Accelerating Deep Ensemble Learning via Noise Injection

TL;DR

The paper tackles the high training cost of neural network ensembles by proposing Noisy Deep Ensemble, where a parent model is trained to convergence and its weights are perturbed to create multiple diverse child models that are trained briefly and ensembled by averaging predictions. The approach uses weight perturbations with a mask and either Gaussian or uniform noise, controlled by an perturbation proportion and scale , to explore multiple local minima while keeping training time low. Empirical results on CIFAR-10/100 across CNNs show Noisy Deep Ensemble often matches or surpasses traditional ensembles and outperforms Snapshot Ensemble, with notable gains in diversity and robustness; training time can drop to about of standard ensemble time when using . The work demonstrates practical benefits for scalable, accurate ensembles and provides a framework for future improvements via targeted perturbation strategies and calibration analysis.

Abstract

Neural network ensembles is a simple yet effective approach for enhancing generalization capabilities. The most common method involves independently training multiple neural networks initialized with different weights and then averaging their predictions during inference. However, this approach increases training time linearly with the number of ensemble members. To address this issue, we propose the novel ``\textbf{Noisy Deep Ensemble}'' method, significantly reducing the training time required for neural network ensembles. In this method, a \textit{parent model} is trained until convergence, and then the weights of the \textit{parent model} are perturbed in various ways to construct multiple \textit{child models}. This perturbation of the \textit{parent model} weights facilitates the exploration of different local minima while significantly reducing the training time for each ensemble member. We evaluated our method using diverse CNN architectures on CIFAR-10 and CIFAR-100 datasets, surpassing conventional efficient ensemble methods and achieving test accuracy comparable to standard ensembles. Code is available at \href{https://github.com/TSTB-dev/NoisyDeepEnsemble}{https://github.com/TSTB-dev/NoisyDeepEnsemble}

Paper Structure

This paper contains 18 sections, 5 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Difference in the learning process between Noisy Deep Ensemble and the existing method (Snapshot Ensemble Snapshot). Noisy Deep Ensemble promotes the exploration of wider parameter space by not being limited to the optimization path of SGD through Noise Injection. Snapshot Ensemble consists of $\mathcal{M} = \{\theta_2, \theta_3\}$, while Noisy Deep Ensemble consists of $\mathcal{M} = \{\theta_2', \theta_3'\}$.
  • Figure 2: Overview of the Noisy Deep Ensemble
  • Figure 3: The disagreement rate among ensemble members' predictions
  • Figure 4: Average KL divergence between the prediction probability distributions of ensemble members on the test data
  • Figure 5: The effects of changes in perturbations on the accuracy
  • ...and 1 more figures