The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Juyoung Yun; Byungkon Kang; Zhoulai Fu

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Juyoung Yun, Byungkon Kang, Zhoulai Fu

TL;DR

The paper tackles whether pure 16-bit floating-point neural networks can match the accuracy of 32-bit models in classification tasks. It develops an error-tolerance framework, defining output discrepancy $ oldsymbol{ extdelta}(M_{32},M_{16},x)$ and the prediction margin $ oldsymbol{ extGamma}(M,x)$, and proves a sufficient condition $ oldsymbol{ extGamma}(M_{32},x) obreak\\geq\\ 2oldsymbol{ extdelta}(M_{32},M_{16},x)$ under which $ ext{pred}(M_{32},x)= ext{pred}(M_{16},x)$. The authors validate their theory with MNIST and CIFAR-10 experiments across DNN and CNN architectures, showing that pure 16-bit networks achieve competitive or better accuracy than 32-bit and often surpass mixed-precision baselines, while delivering substantial reductions in training time and model size. They also discuss practical limitations, such as the need to tune optimizer epsilon, challenges with batch normalization in pure 16-bit, and batch-size effects, outlining avenues for extending the work to more architectures and tasks. Overall, the findings challenge the notion that pure 16-bit training is impractical, highlighting significant efficiency gains with minimal loss in performance in typical classification scenarios.

Abstract

Lowering the precision of neural networks from the prevalent 32-bit precision has long been considered harmful to performance, despite the gain in space and time. Many works propose various techniques to implement half-precision neural networks, but none study pure 16-bit settings. This paper investigates the unexpected performance gain of pure 16-bit neural networks over the 32-bit networks in classification tasks. We present extensive experimental results that favorably compare various 16-bit neural networks' performance to those of the 32-bit models. In addition, a theoretical analysis of the efficiency of 16-bit models is provided, which is coupled with empirical evidence to back it up. Finally, we discuss situations in which low-precision training is indeed detrimental.

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

TL;DR

and the prediction margin

, and proves a sufficient condition

under which

. The authors validate their theory with MNIST and CIFAR-10 experiments across DNN and CNN architectures, showing that pure 16-bit networks achieve competitive or better accuracy than 32-bit and often surpass mixed-precision baselines, while delivering substantial reductions in training time and model size. They also discuss practical limitations, such as the need to tune optimizer epsilon, challenges with batch normalization in pure 16-bit, and batch-size effects, outlining avenues for extending the work to more architectures and tasks. Overall, the findings challenge the notion that pure 16-bit training is impractical, highlighting significant efficiency gains with minimal loss in performance in typical classification scenarios.

Abstract

Paper Structure (20 sections, 1 theorem, 9 equations, 6 figures, 5 tables)

This paper contains 20 sections, 1 theorem, 9 equations, 6 figures, 5 tables.

Introduction
Related Work
Background
General Notation
Floating-Point Representation
Floating-point Errors
Theory
Experiments
DNN Experiments
CNN Experiments
Training Time and Accuarcy
Model Size
Limitation and Discussion
Light hyperparameter-tuning.
Missing 16-bit Batch normalization.
...and 5 more sections

Key Result

Lemma 4.3

Consider a classification problem characterized by a pair $(X,Y)$ where $X$ is the space of input data, and $Y= \{0, .. N-1\}$ is the labels of classification. Suppose a learning algorithm trains a 32-bit model $M_{32}: X \to \mathbf{F}_{32}^N$ and a 16-bit model $M_{16}: X \to \mathbf{F}_{16}^N$ on then $\mathrm{pred}(M_{32}, x)=\mathrm{pred}(M_{16}, x)$.

Figures (6)

Figure 1: Top-1 accuracy (top row) and computational time (bottom row) on MNIST Classification using DNN
Figure 2: Top-1 Accuracy, Top-2 Accuracy, and Computational Time for Cifar-10 Classification
Figure 3: MNIST classification top-1 accuracy and computational time
Figure 4: CIFAR-10 classification top-1 and top-2 accuracy and computational time without BN Layers
Figure 5: Results from using 16-bit BN layers
...and 1 more figures

Theorems & Definitions (4)

Definition 4.1
Definition 4.2
Lemma 4.3
proof

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

TL;DR

Abstract

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)