Table of Contents
Fetching ...

Implementation and Analysis of Thermometer Encoding in DWN FPGA Accelerators

Michael Mecik, Martin Kumm

TL;DR

The paper analyzes the hardware cost of integrating thermometer encoding into differential weightless neural networks (DWN) on FPGA accelerators. It introduces a complete hardware generator that includes the thermometer encoder, LUT layer, and classification logic, enabling a full resource evaluation on the Jet Substructure Classification task. Key findings show encoding can substantially inflate LUT usage, but post-training quantization and fine-tuning can mitigate this overhead and preserve accuracy. The work highlights the need for encoding-aware co-design and provides guidance on how encoder size, LUT counts, and popcount logic shape overall hardware efficiency across model scales.

Abstract

Fully parallel neural network accelerators on field-programmable gate arrays (FPGAs) offer high throughput for latency-critical applications but face hardware resource constraints. Weightless neural networks (WNNs) efficiently replace arithmetic with logic-based inference. Differential weightless neural networks (DWN) further optimize resource usage by learning connections between encoders and LUT layers via gradient-based training. However, DWNs rely on thermometer encoding, and the associated hardware cost has not been fully evaluated. We present a DWN hardware generator that includes thermometer encoding explicitly. Experiments on the Jet Substructure Classification (JSC) task show that encoding can increase LUT usage by up to 3.20$\times$, dominating costs in small networks and highlighting the need for encoding-aware hardware design in DWN accelerators.

Implementation and Analysis of Thermometer Encoding in DWN FPGA Accelerators

TL;DR

The paper analyzes the hardware cost of integrating thermometer encoding into differential weightless neural networks (DWN) on FPGA accelerators. It introduces a complete hardware generator that includes the thermometer encoder, LUT layer, and classification logic, enabling a full resource evaluation on the Jet Substructure Classification task. Key findings show encoding can substantially inflate LUT usage, but post-training quantization and fine-tuning can mitigate this overhead and preserve accuracy. The work highlights the need for encoding-aware co-design and provides guidance on how encoder size, LUT counts, and popcount logic shape overall hardware efficiency across model scales.

Abstract

Fully parallel neural network accelerators on field-programmable gate arrays (FPGAs) offer high throughput for latency-critical applications but face hardware resource constraints. Weightless neural networks (WNNs) efficiently replace arithmetic with logic-based inference. Differential weightless neural networks (DWN) further optimize resource usage by learning connections between encoders and LUT layers via gradient-based training. However, DWNs rely on thermometer encoding, and the associated hardware cost has not been fully evaluated. We present a DWN hardware generator that includes thermometer encoding explicitly. Experiments on the Jet Substructure Classification (JSC) task show that encoding can increase LUT usage by up to 3.20, dominating costs in small networks and highlighting the need for encoding-aware hardware design in DWN accelerators.

Paper Structure

This paper contains 6 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview DWN Architecture
  • Figure 2: Distributive vs. Uniform Encoding of JSC Dataset
  • Figure 3: Thermometer Encoder Component
  • Figure 4: Argmax Component composed of Index Comparators
  • Figure 5: Component Breakdown for DWN-PEN + FT
  • ...and 1 more figures