Coded Deep Learning: Framework and Algorithm

En-hui Yang; Shayan Mohajer Hamidi

Coded Deep Learning: Framework and Algorithm

En-hui Yang, Shayan Mohajer Hamidi

TL;DR

CDL introduces a framework that injects information-theoretic coding into deep learning by using trainable probabilistic quantizers with CPMFs $P_{\alpha}(\cdot|\theta)$ and entropy constraints $H(\cdot)$, enabling quantized forward/backward passes and compressible weights/activations during training. The method yields a softened gradient path via the soft deterministic quantizer $\mathsf{Q}_{\rm d}(\cdot)$ and an entropy-regularized objective that minimizes both cross-entropy loss and description lengths. A relaxed variant, R-CDL, uses $\mathsf{Q}_{\rm d}(\cdot)$ for gradients with full-precision forward/backward, delivering better accuracy–compression trade-offs. Across CIFAR-100 and ImageNet with ResNet variants, CDL and R-CDL outperform existing QAT baselines at similar or lower bit regimes, while enabling highly compressible models via Huffman coding. The approach reduces training/inference complexity and communication costs in model/data parallelism, with the trained model stored in a quantized, compressible format.

Abstract

The success of deep learning (DL) is often achieved with large models and high complexity during both training and post-training inferences, hindering training in resource-limited settings. To alleviate these issues, this paper introduces a new framework dubbed ``coded deep learning'' (CDL), which integrates information-theoretic coding concepts into the inner workings of DL, to significantly compress model weights and activations, reduce computational complexity at both training and post-training inference stages, and enable efficient model/data parallelism. Specifically, within CDL, (i) we first propose a novel probabilistic method for quantizing both model weights and activations, and its soft differentiable variant which offers an analytic formula for gradient calculation during training; (ii) both the forward and backward passes during training are executed over quantized weights and activations, eliminating most floating-point operations and reducing training complexity; (iii) during training, both weights and activations are entropy constrained so that they are compressible in an information-theoretic sense throughout training, thus reducing communication costs in model/data parallelism; and (iv) the trained model in CDL is by default in a quantized format with compressible quantized weights, reducing post-training inference and storage complexity. Additionally, a variant of CDL, namely relaxed CDL (R-CDL), is presented to further improve the trade-off between validation accuracy and compression though requiring full precision in training with other advantageous features of CDL intact. Extensive empirical results show that CDL and R-CDL outperform the state-of-the-art algorithms in DNN compression in the literature.

Coded Deep Learning: Framework and Algorithm

TL;DR

CDL introduces a framework that injects information-theoretic coding into deep learning by using trainable probabilistic quantizers with CPMFs

and entropy constraints

, enabling quantized forward/backward passes and compressible weights/activations during training. The method yields a softened gradient path via the soft deterministic quantizer

and an entropy-regularized objective that minimizes both cross-entropy loss and description lengths. A relaxed variant, R-CDL, uses

for gradients with full-precision forward/backward, delivering better accuracy–compression trade-offs. Across CIFAR-100 and ImageNet with ResNet variants, CDL and R-CDL outperform existing QAT baselines at similar or lower bit regimes, while enabling highly compressible models via Huffman coding. The approach reduces training/inference complexity and communication costs in model/data parallelism, with the trained model stored in a quantized, compressible format.

Abstract

Paper Structure (19 sections, 2 theorems, 38 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 2 theorems, 38 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Related Works
Quantization-Aware Training (QAT)
Modes of Parallelism
Model parallelism
Data parallelism
Notation and Preliminaries
Notation
Preliminaries
Probabilistic Quantization Method
Probabilistic Quantizer $\mathsf{Q}_{\rm p}(\cdot)$
Soft Deterministic Quantizer $\mathsf{Q}_{\rm d}(\cdot)$
CDL and R-CDL
Incorporating Trainable Probabilistic Quantizers $\mathsf{Q}_{\rm p}(\cdot)$ Into CDL
Computation of Gradients of the Loss Function \ref{['eq:Qloss']}
...and 4 more sections

Key Result

Proposition 1

For any $\theta$ and $\alpha > 0$, where $\text{Var} \{ \mathsf{Q}_{\rm p}(\theta) ~|~\theta \}$ is the conditional variance of $\mathsf{Q}_{\rm p}(\theta)$ given $\theta$.

Figures (5)

Figure 1: Illustration of the partial derivatives of $\mathsf{Q}_{\rm d}(\theta)$ w.r.t. $\theta$ (left), and $q$ (right) for $\alpha=\{ 100,300,500,700\}$, where $b$ and $q$ equal $3$ and $0.1$, respectively.
Figure 2: Illustration of $\mathsf{Q}_{\rm u}(\cdot)$ vs $\mathsf{Q}_{\rm d}(\cdot)$ with $\alpha=\{ 100,300,500,700\}$, where $b$ and $q$ are set to $3$ and $0.1$, respectively.
Figure 3: Illustration of the CDL's mechanism.
Figure 4: Comparison of models trained by CDL, R-CDL, and benchmark methods in terms of the Top-1 accuracy vs the average number of bits per weight (top)/activation (bottom) on ImageNet: (a) ResNet-18, and (b) ResNet-34. All models are trained from scratch.
Figure :

Theorems & Definitions (6)

Remark
Proposition 1
proof
Lemma 1
Remark
Remark

Coded Deep Learning: Framework and Algorithm

TL;DR

Abstract

Coded Deep Learning: Framework and Algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)