Table of Contents
Fetching ...

Differentiable Weightless Neural Networks

Alan T. L. Bacellar, Zachary Susskind, Mauricio Breternitz, Eugene John, Lizy K. John, Priscila M. V. Lima, Felipe M. G. França

TL;DR

The paper tackles efficient edge inference by proposing Differentiable Weightless Neural Networks (DWN), a LUT-based, multiplication-free architecture trained with Extended Finite Difference and augmented with Learnable Mapping, Learnable Reduction, and Spectral Regularization. Through extensive FPGA, microcontroller, ultra-low-cost chip, and tabular-data experiments, DWNs show large gains in energy efficiency, latency, and hardware area while achieving competitive or superior accuracy. The work demonstrates DWNs' potential as a scalable, edge-compatible alternative to conventional neural networks for both structured data and hardware-constrained environments. Overall, DWNs offer a compelling path toward high-throughput, low-resource neural inference on diverse edge platforms and datasets.

Abstract

We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We propose Learnable Mapping, Learnable Reduction, and Spectral Regularization to further improve the accuracy and efficiency of these models. We evaluate DWNs in three edge computing contexts: (1) an FPGA-based hardware accelerator, where they demonstrate superior latency, throughput, energy efficiency, and model area compared to state-of-the-art solutions, (2) a low-power microcontroller, where they achieve preferable accuracy to XGBoost while subject to stringent memory constraints, and (3) ultra-low-cost chips, where they consistently outperform small models in both accuracy and projected hardware area. DWNs also compare favorably against leading approaches for tabular datasets, with higher average rank. Overall, our work positions DWNs as a pioneering solution for edge-compatible high-throughput neural networks.

Differentiable Weightless Neural Networks

TL;DR

The paper tackles efficient edge inference by proposing Differentiable Weightless Neural Networks (DWN), a LUT-based, multiplication-free architecture trained with Extended Finite Difference and augmented with Learnable Mapping, Learnable Reduction, and Spectral Regularization. Through extensive FPGA, microcontroller, ultra-low-cost chip, and tabular-data experiments, DWNs show large gains in energy efficiency, latency, and hardware area while achieving competitive or superior accuracy. The work demonstrates DWNs' potential as a scalable, edge-compatible alternative to conventional neural networks for both structured data and hardware-constrained environments. Overall, DWNs offer a compelling path toward high-throughput, low-resource neural inference on diverse edge platforms and datasets.

Abstract

We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We propose Learnable Mapping, Learnable Reduction, and Spectral Regularization to further improve the accuracy and efficiency of these models. We evaluate DWNs in three edge computing contexts: (1) an FPGA-based hardware accelerator, where they demonstrate superior latency, throughput, energy efficiency, and model area compared to state-of-the-art solutions, (2) a low-power microcontroller, where they achieve preferable accuracy to XGBoost while subject to stringent memory constraints, and (3) ultra-low-cost chips, where they consistently outperform small models in both accuracy and projected hardware area. DWNs also compare favorably against leading approaches for tabular datasets, with higher average rank. Overall, our work positions DWNs as a pioneering solution for edge-compatible high-throughput neural networks.

Paper Structure

This paper contains 29 sections, 10 equations, 5 figures, 17 tables.

Figures (5)

  • Figure 1: A very simple DWN for the Iris misc_iris_53 dataset, shown at inference time. DWNs perform computation using multiple layers of directly chained lookup tables (LUT-3s, in this example). Inputs are binarized using a unary "thermometer" encoding, formed into tuples, and concatenated to address the first layer of LUTs. Binary LUT outputs are used to form addresses for subsequent layers. Outputs from the final layer of LUTs are summed to derive activations for each output class. No arithmetic operations are performed between layers of LUTs.
  • Figure 2: Learnable Mapping & Learnable Reduction in DWNs.
  • Figure 3: Implementation of a DWN on an FPGA. Each hardware LUT-6 (subdivided into two LUT-5s and a 2:1 MUX) can implement a six-input RAM node. Registers buffer LUT outputs.
  • Figure 4: An overview of the data layout of a DWN model implemented on the Elegoo Nano. This microcontroller has very limited resources, which necessitates careful memory management.
  • Figure 5: A 12:4 popcount tree composed of 8 full adders and 3 half adders.