Table of Contents
Fetching ...

An experimental comparative study of backpropagation and alternatives for training binary neural networks for image classification

Ben Crulis, Barthelemy Serres, Cyril de Runz, Gilles Venturini

TL;DR

The paper addresses the challenge of training binary neural networks for image classification on edge devices by evaluating multiple learning algorithms beyond standard backpropagation, including DFA, DRTP, FA, HSIC Bottleneck, and SigpropTL, across three architectures on ImageNette and related datasets. It extends prior work by testing more architectures, larger datasets, and adding two new BP alternatives, while providing an open-source PyTorch framework for model-agnostic training of binary and non-binary networks. Key findings show that, on modern architectures with skip connections, backpropagation generally yields the best accuracy, but several alternatives offer substantial reductions in memory and computation and can exceed BP in certain architectures or ablations (e.g., without skip connections). The work provides practical guidance for when to use BP versus alternatives and highlights the trade-offs between accuracy and resource efficiency, informing edge-deployed vision systems and future research avenues.

Abstract

Current artificial neural networks are trained with parameters encoded as floating point numbers that occupy lots of memory space at inference time. Due to the increase in the size of deep learning models, it is becoming very difficult to consider training and using artificial neural networks on edge devices. Binary neural networks promise to reduce the size of deep neural network models, as well as to increase inference speed while decreasing energy consumption. Thus, they may allow the deployment of more powerful models on edge devices. However, binary neural networks are still proven to be difficult to train using the backpropagation-based gradient descent scheme. This paper extends the work of \cite{crulis2023alternatives}, which proposed adapting to binary neural networks two promising alternatives to backpropagation originally designed for continuous neural networks, and experimented with them on simple image classification datasets. This paper proposes new experiments on the ImageNette dataset, compares three different model architectures for image classification, and adds two additional alternatives to backpropagation.

An experimental comparative study of backpropagation and alternatives for training binary neural networks for image classification

TL;DR

The paper addresses the challenge of training binary neural networks for image classification on edge devices by evaluating multiple learning algorithms beyond standard backpropagation, including DFA, DRTP, FA, HSIC Bottleneck, and SigpropTL, across three architectures on ImageNette and related datasets. It extends prior work by testing more architectures, larger datasets, and adding two new BP alternatives, while providing an open-source PyTorch framework for model-agnostic training of binary and non-binary networks. Key findings show that, on modern architectures with skip connections, backpropagation generally yields the best accuracy, but several alternatives offer substantial reductions in memory and computation and can exceed BP in certain architectures or ablations (e.g., without skip connections). The work provides practical guidance for when to use BP versus alternatives and highlights the trade-offs between accuracy and resource efficiency, informing edge-deployed vision systems and future research avenues.

Abstract

Current artificial neural networks are trained with parameters encoded as floating point numbers that occupy lots of memory space at inference time. Due to the increase in the size of deep learning models, it is becoming very difficult to consider training and using artificial neural networks on edge devices. Binary neural networks promise to reduce the size of deep neural network models, as well as to increase inference speed while decreasing energy consumption. Thus, they may allow the deployment of more powerful models on edge devices. However, binary neural networks are still proven to be difficult to train using the backpropagation-based gradient descent scheme. This paper extends the work of \cite{crulis2023alternatives}, which proposed adapting to binary neural networks two promising alternatives to backpropagation originally designed for continuous neural networks, and experimented with them on simple image classification datasets. This paper proposes new experiments on the ImageNette dataset, compares three different model architectures for image classification, and adds two additional alternatives to backpropagation.
Paper Structure (30 sections, 2 figures, 9 tables)

This paper contains 30 sections, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Summary of the algorithms tested in the following experiments. The $B_k$ matrices are constant random matrices initialized once before starting to train the model. Adapted from Frenkel2021.
  • Figure 2: Summary of the chosen binarization scheme effective at each binary weight layer. Here a single layer of the model at index $k$ is represented with both the linear layer using the binary weights $W^b_k$ and the activation function $f$.