Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks
Shakir Yousefi, Andreas Plesner, Till Aczel, Roger Wattenhofer
TL;DR
This work tackles the discretization gap and slow training in differentiable logic gate networks by introducing Gumbel Logic Gate Networks, which inject Gumbel noise into gate selection and use a straight-through estimator to align training with inference. The authors prove that Gumbel perturbations implicitly regularize the Hessian trace, yielding smoother loss landscapes and faster convergence, while reducing sensitivity to discretization. Empirically, GLGNs outperform prior DLGNs on CIFAR-10/100, achieving up to 4.5× faster training, a 98% reduction in the discretization gap, and near-zero unused gates, with benefits that scale with depth. The results suggest that stochastic gate selection combined with backward-compatible discretization can substantially improve the practicality of differentiable LGNs for efficient image classification and broader NAS-like search spaces.
Abstract
Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple problem like CIFAR-10 can take days to weeks to train. Even then, almost half of the network remains unused, causing a discretization gap. This discretization gap hinders real-world deployment of LGNs, as the performance drop between training and inference negatively impacts accuracy. We inject Gumbel noise with a straight-through estimator during training to significantly speed up training, improve neuron utilization, and decrease the discretization gap. We theoretically show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs. We train networks $4.5 \times$ faster in wall-clock time, reduce the discretization gap by $98\%$, and reduce the number of unused gates by $100\%$.
