Table of Contents
Fetching ...

From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks

Sven Brändle, Till Aczel, Andreas Plesner, Roger Wattenhofer

TL;DR

This work evaluates the scalability of Differentiable Logic Gate Networks to large-class problems, focusing on the Group-Sum output layer and the impact of the temperature parameter $\tau$ on expressiveness and pruning. By combining synthetic data, ImageNet-32, and MNIST-like datasets, the authors show that higher backbone capacity and careful $\tau$ tuning enable competitive performance up to several hundred classes and can outperform standard MLPs in some large-class regimes, though real-world data like ImageNet-32 remain challenging. The study also compares alternative output designs and demonstrates that $\tau$-driven neuron utilization is a key mechanism driving scalability and robustness. Overall, the results suggest that DLGNs are promising for structured, multi-class tasks but require architectural and optimization advances to match CNN-based models on complex natural images. The findings have practical implications for energy-efficient, hardware-friendly inference where logics-based computation can be advantageous, especially with improved output-layer strategies and regularization techniques.

Abstract

Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes.

From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks

TL;DR

This work evaluates the scalability of Differentiable Logic Gate Networks to large-class problems, focusing on the Group-Sum output layer and the impact of the temperature parameter on expressiveness and pruning. By combining synthetic data, ImageNet-32, and MNIST-like datasets, the authors show that higher backbone capacity and careful tuning enable competitive performance up to several hundred classes and can outperform standard MLPs in some large-class regimes, though real-world data like ImageNet-32 remain challenging. The study also compares alternative output designs and demonstrates that -driven neuron utilization is a key mechanism driving scalability and robustness. Overall, the results suggest that DLGNs are promising for structured, multi-class tasks but require architectural and optimization advances to match CNN-based models on complex natural images. The findings have practical implications for energy-efficient, hardware-friendly inference where logics-based computation can be advantageous, especially with improved output-layer strategies and regularization techniques.

Abstract

Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes.

Paper Structure

This paper contains 35 sections, 4 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: DLGNs (blue) consistently outperform MLPs (red) across classification tasks with up to 2000 classes. The result illustrates the potential of logic-gate-based architectures to remain effective when applied to large-scale classification problems.
  • Figure 2: Top row: illustration of class-specific position sampling and initialization in the synthetic dataset. For each class, a random subset of input positions is chosen and fixed to either 0 or 1, defining the class identity. The remaining positions are left unconstrained and are randomly assigned for each individual sample. Bottom row: four complete examples generated for the same class, demonstrating that all samples share the fixed positions while the random positions vary across instances. This design ensures that the dataset is easy to separate at the feature level, so performance differences can be attributed primarily to the capacity of the output layer rather than the backbone.
  • Figure 3: Accuracy of DLGNs compared to the MLP model, considering an increasing number of classes. Small: A DLGN with a layer size of 64'000 logical gates. Big: A DLGN with a layer size of 256'000 logical gates. The MLP model refers to a conventional MLP with three layers of 512 neurons and Batchnorm. The accuracy of all DLGNs stays high up until a few hundred classes, but sharply drops after.
  • Figure 4: Accuracy of DLGNs compared to the MLP model, considering an increasing number of ImageNet classes. Big: A DLGN with a layer size of 256'000 logical gates. The MLP model refers to a conventional MLP with three layers of 512 neurons and Batchnorm.
  • Figure 5: Left: Distribution of neuron activation rates for two models. Larger $\tau$ values concentrate neurons at low activation rates, while smaller $\tau$ shifts the distribution toward higher activations. Right: Best $\tau\in\{0.1, 1, 10, 100\}$ for various numbers of output neurons and neurons per class.
  • ...and 14 more figures