Table of Contents
Fetching ...

Light Differentiable Logic Gate Networks

Lukas Rüttgers, Till Aczel, Andreas Plesner, Roger Wattenhofer

TL;DR

This work addresses the scalability challenges of differentiable logic gate networks (DLGNs) by pinpointing the root causes to the parametrization of logic gate neurons. It introduces an input-wise parametrization (IWP) that reduces parameter count to $2^n$ per gate and, when paired with negation-asymmetric heavy-tail initializations (RI), enables deeper networks with improved gradient stability and substantially faster training, while maintaining or improving CIFAR-100 accuracy. Empirical results show 4x memory reduction, up to 1.86x faster backward passes, and up to 8.5x fewer training steps, with CIFAR-100 performance remaining stable or better than the original parametrization. The paper also discusses remaining expressivity and generalization gaps and suggests future work on learning gates with more inputs and encoding-aware architectures to further enhance DLGN performance.

Abstract

Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost impede scaling these networks. Even with dedicated parameter initialization schemes from subsequent works, increasing depth still harms accuracy. We show that the root cause of these issues lies in the underlying parametrization of logic gate neurons themselves. To overcome this issue, we propose a reparametrization that also shrinks the parameter size logarithmically in the number of inputs per gate. For binary inputs, this already reduces the model size by 4x, speeds up the backward pass by up to 1.86x, and converges in 8.5x fewer training steps. On top of that, we show that the accuracy on CIFAR-100 remains stable and sometimes superior to the original parametrization.

Light Differentiable Logic Gate Networks

TL;DR

This work addresses the scalability challenges of differentiable logic gate networks (DLGNs) by pinpointing the root causes to the parametrization of logic gate neurons. It introduces an input-wise parametrization (IWP) that reduces parameter count to per gate and, when paired with negation-asymmetric heavy-tail initializations (RI), enables deeper networks with improved gradient stability and substantially faster training, while maintaining or improving CIFAR-100 accuracy. Empirical results show 4x memory reduction, up to 1.86x faster backward passes, and up to 8.5x fewer training steps, with CIFAR-100 performance remaining stable or better than the original parametrization. The paper also discusses remaining expressivity and generalization gaps and suggests future work on learning gates with more inputs and encoding-aware architectures to further enhance DLGN performance.

Abstract

Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost impede scaling these networks. Even with dedicated parameter initialization schemes from subsequent works, increasing depth still harms accuracy. We show that the root cause of these issues lies in the underlying parametrization of logic gate neurons themselves. To overcome this issue, we propose a reparametrization that also shrinks the parameter size logarithmically in the number of inputs per gate. For binary inputs, this already reduces the model size by 4x, speeds up the backward pass by up to 1.86x, and converges in 8.5x fewer training steps. On top of that, we show that the accuracy on CIFAR-100 remains stable and sometimes superior to the original parametrization.

Paper Structure

This paper contains 60 sections, 11 equations, 23 figures, 1 table.

Figures (23)

  • Figure 1: For a CIFAR-10 DLGN PetersenBKD22, our reparametrized DLGNs require 4x less memory, converge in 8.5x fewer training steps, and perform the forward and backward passes in up to 8% and 45% less time, respectively. Details in \ref{['sec: results']} and Appendix \ref{['app:training-efficiency']}.
  • Figure 2: Illustrating the reparametrization for logic gates with one input. It requires only $2^n$ learnable parameters $\Omega$ for $n$ inputs, opposed to $2^{2^n}$ for the original parametrization.
  • Figure 3: Distribution of gate outputs for an IWP DLGN right after residual initialization (RI), averaged over 100 images of CIFAR-100. That way, RI postpones gate learning in later layers until earlier layers are more refined. This incremental refinement allows to learn complex deep networks.
  • Figure 4: Discretized test accuracy, averaged over three seeds, when scaling the CIFAR-10 M DLGN PetersenBKD22 and CDLGN PetersenKBWE24 in depth.
  • Figure 5: Training times for the DLGN with 20-fold depth. Mean and standard deviation were computed over 20 batches of CIFAR-100.
  • ...and 18 more figures