Table of Contents
Fetching ...

Expanding-and-Shrinking Binary Neural Networks

Xulong Shi, Caiyi Sun, Zhi Qi, Liu Hao, Xiaodong Yang

TL;DR

Binary neural networks offer substantial speed and memory benefits but suffer accuracy gaps due to limited per-layer representation capacity. The authors address this by introducing the Expanding-and-Shrinking (ES) operation combined with Binary Group Convolution (G), boosting capacity without increasing compute and extending to Transformers by FFN augmentation with minimal change to attention cost. Key contributions include formalizing representation capacity, detailing ES and G components, applying them to CNNs and Transformers, and showing consistent gains across image classification, object detection, and diffusion-model SR with negligible overhead. The approach enables more accurate, deployment-friendly BNNs across diverse tasks, with a public code release.

Abstract

While binary neural networks (BNNs) offer significant benefits in terms of speed, memory and energy, they encounter substantial accuracy degradation in challenging tasks compared to their real-valued counterparts. Due to the binarization of weights and activations, the possible values of each entry in the feature maps generated by BNNs are strongly constrained. To tackle this limitation, we propose the expanding-and-shrinking operation, which enhances binary feature maps with negligible increase of computation complexity, thereby strengthening the representation capacity. Extensive experiments conducted on multiple benchmarks reveal that our approach generalizes well across diverse applications ranging from image classification, object detection to generative diffusion model, while also achieving remarkable improvement over various leading binarization algorithms based on different architectures including both CNNs and Transformers.

Expanding-and-Shrinking Binary Neural Networks

TL;DR

Binary neural networks offer substantial speed and memory benefits but suffer accuracy gaps due to limited per-layer representation capacity. The authors address this by introducing the Expanding-and-Shrinking (ES) operation combined with Binary Group Convolution (G), boosting capacity without increasing compute and extending to Transformers by FFN augmentation with minimal change to attention cost. Key contributions include formalizing representation capacity, detailing ES and G components, applying them to CNNs and Transformers, and showing consistent gains across image classification, object detection, and diffusion-model SR with negligible overhead. The approach enables more accurate, deployment-friendly BNNs across diverse tasks, with a public code release.

Abstract

While binary neural networks (BNNs) offer significant benefits in terms of speed, memory and energy, they encounter substantial accuracy degradation in challenging tasks compared to their real-valued counterparts. Due to the binarization of weights and activations, the possible values of each entry in the feature maps generated by BNNs are strongly constrained. To tackle this limitation, we propose the expanding-and-shrinking operation, which enhances binary feature maps with negligible increase of computation complexity, thereby strengthening the representation capacity. Extensive experiments conducted on multiple benchmarks reveal that our approach generalizes well across diverse applications ranging from image classification, object detection to generative diffusion model, while also achieving remarkable improvement over various leading binarization algorithms based on different architectures including both CNNs and Transformers.

Paper Structure

This paper contains 18 sections, 3 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Overview of performance improvement for image classification (top-1 accuracy) on ImageNet, object detection (mAP) on PASCAL VOC, and generative diffusion model based image super-resolution (PSNR) on Manga. Enabled by the proposed ES-BNN, various binary neural networks based on both CNNs and Transformers obtain consistent and significant performance boost.
  • Figure 2: A schematic overview of the proposed approach ES-BNN utilized in each individual layer of a binary neural network. (a) shows a standard binary convolution layer. (b) illustrates the expanding-and-shrinking operation applied on top of (a). In (c) the binary group convolution is further integrated based on (b).
  • Figure 3: Illustration of the diversification process of the replicated feature maps in the proposed approach. (a) shows the input image. (b) presents the replication of 8 feature maps (in one row) from the 7th layer in ResNet-20. (c) demonstrates the feature maps in (b) after the batch normalization. (d) corresponds to the feature maps in (c) after the residual connection.
  • Figure 4: Comparison of the representation capacity of each individual layer before and after applying ES-BNN in a standard architecture ResNet-18 and a customized architecture BNext.
  • Figure 5: Comparison of our approach with different combinations of the proposed expanding-and-shrinking operation (ES) as well as the binary group convolution (G) using ResNet-20 on CIFAR-10.
  • ...and 6 more figures