From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks
Sven Brändle, Till Aczel, Andreas Plesner, Roger Wattenhofer
TL;DR
This work evaluates the scalability of Differentiable Logic Gate Networks to large-class problems, focusing on the Group-Sum output layer and the impact of the temperature parameter $\tau$ on expressiveness and pruning. By combining synthetic data, ImageNet-32, and MNIST-like datasets, the authors show that higher backbone capacity and careful $\tau$ tuning enable competitive performance up to several hundred classes and can outperform standard MLPs in some large-class regimes, though real-world data like ImageNet-32 remain challenging. The study also compares alternative output designs and demonstrates that $\tau$-driven neuron utilization is a key mechanism driving scalability and robustness. Overall, the results suggest that DLGNs are promising for structured, multi-class tasks but require architectural and optimization advances to match CNN-based models on complex natural images. The findings have practical implications for energy-efficient, hardware-friendly inference where logics-based computation can be advantageous, especially with improved output-layer strategies and regularization techniques.
Abstract
Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes.
