Designed Dithering Sign Activation for Binary Neural Networks

Brayan Monroy; Juan Estupiñan; Tatiana Gelvez-Barrera; Jorge Bacca; Henry Arguello

Designed Dithering Sign Activation for Binary Neural Networks

Brayan Monroy, Juan Estupiñan, Tatiana Gelvez-Barrera, Jorge Bacca, Henry Arguello

TL;DR

This work tackles information loss in Binary Neural Networks caused by binarizing activations. It introduces DeSign, a designed dithering Sign activation using a spatially periodic threshold kernel that is optimized to preserve structural information while maintaining binary computations. The threshold kernel design combines a brute-force selection based on total variation and an entry-scaling step to BN distributions, with 2D and 3D variants (DeSign3D) to exploit spatial and channel correlations. Empirical evaluations on CIFAR-10/100 and STL-10 show DeSign improves accuracy over standard Sign-based BNNs and reduces sensitivity to learned batch normalization parameters, achieving up to 4.51% gains without additional computational cost. The method offers a practical path to higher-performing, energy-efficient BNNs and can extend to other activations and end-to-end design strategies.

Abstract

Binary Neural Networks emerged as a cost-effective and energy-efficient solution for computer vision tasks by binarizing either network weights or activations. However, common binary activations, such as the Sign activation function, abruptly binarize the values with a single threshold, losing fine-grained details in the feature outputs. This work proposes an activation that applies multiple thresholds following dithering principles, shifting the Sign activation function for each pixel according to a spatially periodic threshold kernel. Unlike literature methods, the shifting is defined jointly for a set of adjacent pixels, taking advantage of spatial correlations. Experiments over the classification task demonstrate the effectiveness of the designed dithering Sign activation function as an alternative activation for binary neural networks, without increasing the computational cost. Further, DeSign balances the preservation of details with the efficiency of binary operations.

Designed Dithering Sign Activation for Binary Neural Networks

TL;DR

Abstract

Paper Structure (19 sections, 9 equations, 5 figures, 3 tables)

This paper contains 19 sections, 9 equations, 5 figures, 3 tables.

Introduction
Binary Neural Networks Background
Binary Convolution Layer
Batch-normalization Layer
Activation Layer
Real-valued Activation
Binary-valued Activation
DeSign: Designed Dithering Sign Activation
Threshold Kernel Design
Threshold Kernel Selection
Entry Scaling to Batch Normalization
3D scenario Design
Simulations and Results
Comparison Benchmark
Selection of Design Strategy
...and 4 more sections

Figures (5)

Figure 1: Illustration of the output when applying the ReLU, Sign, and proposed DeSign activations to a reference image. (Top) Generated activation maps. (Bottom) Zoom of a specific output patch. Although Sign and Design outputs are entirely binary, Design offers a better representation of the structure and preservation of fine-grained details within the image.
Figure 2: Binary forward propagation scheme with the proposed DeSign activation. (a) The input $\mathbf{X} \in \mathbb{Z}_2^{h \times w}$ is convolved with binary kernels $\mathbf{K} \in \mathbb{Z}_2^{k \times k}$. (b) The output $\mathbf{X}_c$ is batch-normalized using the trainable parameters $\gamma$ and $\beta$ through the features. (c) The batch-normalized output $\mathbf{X}_s$ is passed trough the DeSign activation. Precisely, the threshold kernel $\mathbf{T}$ is incorporated in the third layer, through the operation $\mathbf{X}_s-(\mathbf{T} \otimes \textbf{1}$) to impose a dithering structure that helps in the preservation of information. Then, the conventional Sign activation is applied, obtaining the binary output $\mathbf{X}_b \in \mathbb{Z}_2^{h-k+1 \times w-k+1}$.
Figure 3: Total Variation score of all threshold kernel candidates. (a) Ordered TV score, (b) top-5 threshold kernels with the highest TV score, and (c) bottom-5 threshold kernels with the lowest TV score.
Figure 4: Distribution range estimation: 1) When using the Sign, there are only three options, all numbers negative, i.e., $[-3\sigma, 0]$, all positive i.e., $[0, 3\sigma ]$, or combined i.e., $[-3\sigma , 3\sigma]$. 2) When using the proposed thresholds, different ranges are possible using the reference threshold values $t_\kappa$, increasing the precision and approximating the behavior of the ReLU function.
Figure 5: Middle outputs activations of BNN architecture on STL-10 dataset. (Top) Binary Convolution outputs, (Middle) Sign activations outputs, and (Bottom) DeSign activations outputs. The incorporation of DeSign activations enables the preservation of fine details along BNNs whiout additional computational cost.

Designed Dithering Sign Activation for Binary Neural Networks

TL;DR

Abstract

Designed Dithering Sign Activation for Binary Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)