Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

Yiwei Lu; Yaoliang Yu; Xinlin Li; Vahid Partovi Nia

Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

Yiwei Lu, Yaoliang Yu, Xinlin Li, Vahid Partovi Nia

TL;DR

This work tackles the challenge of training neural networks with binary weights by addressing the zero-derivative issue of the sign function in forward passes. It introduces ProxConnect++ (PC++), a generalization of ProxConnect that couples forward-backward proximal quantizers ${\mathsf{F}}^{\mu}_{\mathsf{r}}$ and ${\mathsf{B}}^{\mu}_{\mathsf{r}}$, and shows that many existing binarization methods are special cases within this framework. The authors prove a decomposition criterion and establish convergence guarantees for PC++, enabling principled design of forward-backward quantizers; they also reverse-engineer known methods (e.g., BNN, BNN+) to show their place in PC++, and propose BNN++ as a one-step, theoretically grounded improvement. Empirically, PC++ is validated on CNNs and vision transformers, achieving competitive accuracy with up to ~30x memory reduction, and BNN++ frequently delivering the best performance across tasks. Overall, the work provides a unified, theoretically justified pathway to design and evaluate binarization schemes with practical performance benefits for both convolutional and transformer-based architectures.

Abstract

In neural network binarization, BinaryConnect (BC) and its variants are considered the standard. These methods apply the sign function in their forward pass and their respective gradients are backpropagated to update the weights. However, the derivative of the sign function is zero whenever defined, which consequently freezes training. Therefore, implementations of BC (e.g., BNN) usually replace the derivative of sign in the backward computation with identity or other approximate gradient alternatives. Although such practice works well empirically, it is largely a heuristic or ''training trick.'' We aim at shedding some light on these training tricks from the optimization perspective. Building from existing theory on ProxConnect (PC, a generalization of BC), we (1) equip PC with different forward-backward quantizers and obtain ProxConnect++ (PC++) that includes existing binarization techniques as special cases; (2) derive a principled way to synthesize forward-backward quantizers with automatic theoretical guarantees; (3) illustrate our theory by proposing an enhanced binarization algorithm BNN++; (4) conduct image classification experiments on CNNs and vision transformers, and empirically verify that BNN++ generally achieves competitive results on binarizing these models.

Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

TL;DR

and

, and shows that many existing binarization methods are special cases within this framework. The authors prove a decomposition criterion and establish convergence guarantees for PC++, enabling principled design of forward-backward quantizers; they also reverse-engineer known methods (e.g., BNN, BNN+) to show their place in PC++, and propose BNN++ as a one-step, theoretically grounded improvement. Empirically, PC++ is validated on CNNs and vision transformers, achieving competitive accuracy with up to ~30x memory reduction, and BNN++ frequently delivering the best performance across tasks. Overall, the work provides a unified, theoretically justified pathway to design and evaluate binarization schemes with practical performance benefits for both convolutional and transformer-based architectures.

Abstract

Paper Structure (24 sections, 5 theorems, 30 equations, 5 figures, 8 tables)

This paper contains 24 sections, 5 theorems, 30 equations, 5 figures, 8 tables.

Introduction
Background
Post-Training Binarization (PTB):
Binarization-Aware Training (BAT):
ProxConnect
Methodology
ProxConnect++
Experiments
Experimental settings
CNN as backbone
Vision transformer as backbone
Conclusion
More on Proximal Quantizers
Related works
Vision Transformer.
...and 9 more sections

Key Result

Corollary 1

A pair of forward-backward quantizers $(\mathsf{F}, \mathsf{B})$ admits the decomposition in eq:FB-cons (for some smoothing parameter $\mu$ and regularizer $\mathsf{r}$) iff both $\mathsf{F}$ and $\mathsf{B}$ are functions of ${\bm{\mathsfit{P}}}(w) := \int_{-\infty}^w \tfrac{1}{\mathsf{B}(\omega)}\

Figures (5)

Figure 1: Forward and backward pass for ProxConnect++ algorithms (red/blue arrows indicate the forward/backward pass), where fp denotes full precision, bn denotes binary and back-prop denotes backpropagation.
Figure 2: Comparison between Full Precision (FP) model, BNN++, and Post-training Binarization (PTB) on the fine-tuning task on CIFAR-10.
Figure 3: Results of binarizing different components (blocks) of ViT-B architecture on CIFAR-10. Warmer color indicates significant accuracy degradation after binarization.
Figure 4: Different instantiations of the proximal map ${\bm{\mathsfit{L}}}^{\varrho}_{\rho}$ in \ref{['eq:prox-pl']} for $Q = \{-1, 1\}$.
Figure 5: Forward and backward pass for 6 additional ProxConnect++ algorithms.

Theorems & Definitions (8)

Corollary 1
Example 1: BNN
Example 2: BNN+
Example 3: BNN++
Corollary 2
Corollary 2
Theorem 1
Theorem 2

Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

TL;DR

Abstract

Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)