Table of Contents
Fetching ...

BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training

Luca Colombo, Fabrizio Pittorino, Daniele Zambon, Carlo Baldassi, Manuel Roveri, Cesare Alippi

TL;DR

This work tackles the challenge of training Binary Neural Networks (BNNs) by introducing Binary Error Propagation (BEP), a fully binary analog of backpropagation that propagates binary error signals through multi-layer networks using only bitwise operations. BEP employs integer hidden weights, a fixed binary classifier, a binary backward gate, and a sparse, gated update rule to realize end-to-end binary training for both MLPs and recurrent architectures (BEP-TT). The approach yields substantial gains over quantization-aware training and local binary learning rules, achieving up to +6.89% improvements on MLPs and +10.57% on RNNs, while significantly reducing memory and computational requirements. By enabling end-to-end binary learning, BEP opens avenues for efficient deployment in TinyML, privacy-preserving DL, and neuromorphic-inspired hardware, with future work aimed at extending to convolutional and transformer-like architectures and providing convergence guarantees.

Abstract

Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumption. These advantages make them particularly well suited for deployment on resource-constrained devices. However, training BNNs via gradient-based optimization remains challenging due to the discrete nature of their variables. The dominant approach, quantization-aware training, circumvents this issue by employing surrogate gradients. Yet, this method requires maintaining latent full-precision parameters and performing the backward pass with floating-point arithmetic, thereby forfeiting the efficiency of binary operations during training. While alternative approaches based on local learning rules exist, they are unsuitable for global credit assignment and for back-propagating errors in multi-layer architectures. This paper introduces Binary Error Propagation (BEP), the first learning algorithm to establish a principled, discrete analog of the backpropagation chain rule. This mechanism enables error signals, represented as binary vectors, to be propagated backward through multiple layers of a neural network. BEP operates entirely on binary variables, with all forward and backward computations performed using only bitwise operations. Crucially, this makes BEP the first solution to enable end-to-end binary training for recurrent neural network architectures. We validate the effectiveness of BEP on both multi-layer perceptrons and recurrent neural networks, demonstrating gains of up to +6.89% and +10.57% in test accuracy, respectively. The proposed algorithm is released as an open-source repository.

BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training

TL;DR

This work tackles the challenge of training Binary Neural Networks (BNNs) by introducing Binary Error Propagation (BEP), a fully binary analog of backpropagation that propagates binary error signals through multi-layer networks using only bitwise operations. BEP employs integer hidden weights, a fixed binary classifier, a binary backward gate, and a sparse, gated update rule to realize end-to-end binary training for both MLPs and recurrent architectures (BEP-TT). The approach yields substantial gains over quantization-aware training and local binary learning rules, achieving up to +6.89% improvements on MLPs and +10.57% on RNNs, while significantly reducing memory and computational requirements. By enabling end-to-end binary learning, BEP opens avenues for efficient deployment in TinyML, privacy-preserving DL, and neuromorphic-inspired hardware, with future work aimed at extending to convolutional and transformer-like architectures and providing convergence guarantees.

Abstract

Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumption. These advantages make them particularly well suited for deployment on resource-constrained devices. However, training BNNs via gradient-based optimization remains challenging due to the discrete nature of their variables. The dominant approach, quantization-aware training, circumvents this issue by employing surrogate gradients. Yet, this method requires maintaining latent full-precision parameters and performing the backward pass with floating-point arithmetic, thereby forfeiting the efficiency of binary operations during training. While alternative approaches based on local learning rules exist, they are unsuitable for global credit assignment and for back-propagating errors in multi-layer architectures. This paper introduces Binary Error Propagation (BEP), the first learning algorithm to establish a principled, discrete analog of the backpropagation chain rule. This mechanism enables error signals, represented as binary vectors, to be propagated backward through multiple layers of a neural network. BEP operates entirely on binary variables, with all forward and backward computations performed using only bitwise operations. Crucially, this makes BEP the first solution to enable end-to-end binary training for recurrent neural network architectures. We validate the effectiveness of BEP on both multi-layer perceptrons and recurrent neural networks, demonstrating gains of up to +6.89% and +10.57% in test accuracy, respectively. The proposed algorithm is released as an open-source repository.

Paper Structure

This paper contains 35 sections, 4 theorems, 26 equations, 8 figures, 3 tables.

Key Result

Lemma 1

Consider a binary vector $\mathbf{b} \in \{\pm1\}^{K_b}$, a binary matrix $\mathbf{W} \in \{\pm1\}^{K_b \times K_a}$, and a gating vector $\mathbf{g} \in \{0,1\}^{K_b}$. Problem $\mathop{\mathrm{arg\,max}}\limits_{\mathbf{a} \in \{\pm1\}^{K_a}} \langle \mathbf{b}, \mathbf{W}\mathbf{a} \rangle_{\math

Figures (8)

  • Figure 1: Information flow for a sample $\mu$ in an MLP and an RNN trained with BEP. Each model uses a binary core and a fixed classifier. The forward and backward passes are shown in gray and red.
  • Figure 2: Test accuracy as a function of the number of parameters on Random Prototypes, FashionMNIST, CIFAR10, and Imagenette. Results compare BEP with both the SotA approach colombo2025training and QAT-based methods for binary MLPs with $L=2$ and $L=3$ hidden layers.
  • Figure 3: Validation accuracy of a binary RNN trained with BEP on the S-MNIST dataset for different values of the gating threshold hyperparameter $\nu$ and window length WL.
  • Figure 4: Validation accuracy as a function of the gating threshold $\nu$ on Random Prototypes, FashionMNIST, CIFAR10, and Imagenette for binary MLPs with $L=3$ and $L=5$ hidden layers.
  • Figure 5: Validation accuracy as a function of the robustness $r$ on Random Prototypes, FashionMNIST, CIFAR10, and Imagenette for binary MLPs with $L=2$ hidden layers.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Lemma 1: Desired activations
  • Proposition 1: BEP back-projection solves the linear surrogate exactly
  • proof
  • Lemma 2: Convex relaxation has an integral optimum
  • proof
  • proof
  • Lemma 3: Local update correctness on the stabilities
  • proof
  • Remark 1: From stability to visible pre-activation