Light Differentiable Logic Gate Networks
Lukas Rüttgers, Till Aczel, Andreas Plesner, Roger Wattenhofer
TL;DR
This work addresses the scalability challenges of differentiable logic gate networks (DLGNs) by pinpointing the root causes to the parametrization of logic gate neurons. It introduces an input-wise parametrization (IWP) that reduces parameter count to $2^n$ per gate and, when paired with negation-asymmetric heavy-tail initializations (RI), enables deeper networks with improved gradient stability and substantially faster training, while maintaining or improving CIFAR-100 accuracy. Empirical results show 4x memory reduction, up to 1.86x faster backward passes, and up to 8.5x fewer training steps, with CIFAR-100 performance remaining stable or better than the original parametrization. The paper also discusses remaining expressivity and generalization gaps and suggests future work on learning gates with more inputs and encoding-aware architectures to further enhance DLGN performance.
Abstract
Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost impede scaling these networks. Even with dedicated parameter initialization schemes from subsequent works, increasing depth still harms accuracy. We show that the root cause of these issues lies in the underlying parametrization of logic gate neurons themselves. To overcome this issue, we propose a reparametrization that also shrinks the parameter size logarithmically in the number of inputs per gate. For binary inputs, this already reduces the model size by 4x, speeds up the backward pass by up to 1.86x, and converges in 8.5x fewer training steps. On top of that, we show that the accuracy on CIFAR-100 remains stable and sometimes superior to the original parametrization.
