Table of Contents
Fetching ...

Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems

Giung Nam, Juho Lee

TL;DR

This work first showcases the potential of low precision ensembling, where ensemble members are derived from a single model within low precision number systems in a training-free manner, and demonstrates the effectiveness of the proposed low precision ensembling method compared to existing ensemble approaches.

Abstract

While ensembling deep neural networks has shown promise in improving generalization performance, scaling current ensemble methods for large models remains challenging. Given that recent progress in deep learning is largely driven by the scale, exemplified by the widespread adoption of large-scale neural network architectures, scalability emerges an increasingly critical issue for machine learning algorithms in the era of large-scale models. In this work, we first showcase the potential of low precision ensembling, where ensemble members are derived from a single model within low precision number systems in a training-free manner. Our empirical analysis demonstrates the effectiveness of our proposed low precision ensembling method compared to existing ensemble approaches.

Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems

TL;DR

This work first showcases the potential of low precision ensembling, where ensemble members are derived from a single model within low precision number systems in a training-free manner, and demonstrates the effectiveness of the proposed low precision ensembling method compared to existing ensemble approaches.

Abstract

While ensembling deep neural networks has shown promise in improving generalization performance, scaling current ensemble methods for large models remains challenging. Given that recent progress in deep learning is largely driven by the scale, exemplified by the widespread adoption of large-scale neural network architectures, scalability emerges an increasingly critical issue for machine learning algorithms in the era of large-scale models. In this work, we first showcase the potential of low precision ensembling, where ensemble members are derived from a single model within low precision number systems in a training-free manner. Our empirical analysis demonstrates the effectiveness of our proposed low precision ensembling method compared to existing ensemble approaches.

Paper Structure

This paper contains 20 sections, 9 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Concepts of low precision ensembling. It shows a two-dimensional schematic, where the x and y axes represent the neural network weights, while the contours above visualize the loss surface. (a) Let the pre-trained weights, denoted by a yellow star-shaped marker ($\medstar$), be positioned within a basin on the loss landscape. In general, (b) post-training quantization methods introduce lower precision number systems, and then (c) choose one candidate from the system, such as the nearest one. (d) However, there are many other highly effective models available, that can contribute to ensemble predictions.
  • Figure 2: Comparing low precision ensembling to Bayesian methods. Negative log-likelihood for Bayesian model averaging using an approximate Gaussian posterior derived from SWAG or IVON (BMA, shown in orange) and low precision ensembling with Bernoulli stochastic rounding centered around the MAP solution obtained by each optimizer (LPE-BSR, shown in green).
  • Figure 3: Comparison between IVON and LPE-BSR samples. Radial landscape plots visualize a plane subspace defined by three points: the MAP obtained by IVON (depicted as a yellow star $\medstar$), samples in BMA and LPE-BSR procedures (represented by blue and red circle markers $\circ$).
  • Figure 4: Comparison between snapshot and LPE-BSR samples. Radial landscape plots visualize a plane subspace defined by three points: the first and second snapshot samples obtained by SSE (represented by yellow and blue star-shaped marker $\medstar$), and LPE-BSR sample derived from the first snapshot (depicted as a red circle $\circ$).
  • Figure 5: Combining with fast ensembling methods. Negative log-likelihood and expected calibration error for fast ensembling methods, SSE and CSGLD, in terms of training budgets, i.e., the number of backward passes, and memory budgets, i.e., the total number of bits for representing ensemble. Top: Results with SSE. Bottom: Results with CSGLD.
  • ...and 4 more figures