Table of Contents
Fetching ...

Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

Andreas Böttcher, Martin Kumm

TL;DR

This work tackles the need for efficient small-width multipliers on FPGA for AI inference by replacing conventional rectangular sub-multipliers with incomplete, irregular LUT-based tiles within a multiplier tiling framework. It introduces a design space expansion through a systematic search (restricted to a $4\times4$ board) and leverages truth-table simplification to identify high-efficiency tiles, then uses an ILP-based optimization to jointly select tiles and compressor-tree structures. Empirical results on Kintex-7 show that incomplete tiles reduce LUT usage by up to $17.6\%$ (average $3.7\%$) across sizes up to $16\times16$, with notable gains in dense packing scenarios and competitive performance in CPD and latency. The approach delivers higher arithmetic density for AI workloads while maintaining practical critical-path and latency characteristics, and is broadly applicable to modern FPGA platforms.

Abstract

There is a recent trend in artificial intelligence (AI) inference towards lower precision data formats down to 8 bits and less. As multiplication is the most complex operation in typical inference tasks, there is a large demand for efficient small multipliers. The large DSP blocks have limitations implementing many small multipliers efficiently. Hence, this work proposes a solution for better logic-based multipliers that is especially beneficial for small multipliers. Our work is based on the multiplier tiling method in which a multiplier is designed out of several sub-multiplier tiles. The key observation we made is that these sub-multipliers do not necessarily have to perform a complete (rectangular) NxK multiplication and more efficient sub-multipliers are possible that are incomplete (non-rectangular). This proposal first seeks to identify efficient incomplete irregular sub-multipliers and then demonstrates improvements over state-of-the-art designs. It is shown that optimal solutions can be found using integer linear programming (ILP), which are evaluated in FPGA synthesis experiments.

Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

TL;DR

This work tackles the need for efficient small-width multipliers on FPGA for AI inference by replacing conventional rectangular sub-multipliers with incomplete, irregular LUT-based tiles within a multiplier tiling framework. It introduces a design space expansion through a systematic search (restricted to a board) and leverages truth-table simplification to identify high-efficiency tiles, then uses an ILP-based optimization to jointly select tiles and compressor-tree structures. Empirical results on Kintex-7 show that incomplete tiles reduce LUT usage by up to (average ) across sizes up to , with notable gains in dense packing scenarios and competitive performance in CPD and latency. The approach delivers higher arithmetic density for AI workloads while maintaining practical critical-path and latency characteristics, and is broadly applicable to modern FPGA platforms.

Abstract

There is a recent trend in artificial intelligence (AI) inference towards lower precision data formats down to 8 bits and less. As multiplication is the most complex operation in typical inference tasks, there is a large demand for efficient small multipliers. The large DSP blocks have limitations implementing many small multipliers efficiently. Hence, this work proposes a solution for better logic-based multipliers that is especially beneficial for small multipliers. Our work is based on the multiplier tiling method in which a multiplier is designed out of several sub-multiplier tiles. The key observation we made is that these sub-multipliers do not necessarily have to perform a complete (rectangular) NxK multiplication and more efficient sub-multipliers are possible that are incomplete (non-rectangular). This proposal first seeks to identify efficient incomplete irregular sub-multipliers and then demonstrates improvements over state-of-the-art designs. It is shown that optimal solutions can be found using integer linear programming (ILP), which are evaluated in FPGA synthesis experiments.
Paper Structure (16 sections, 10 equations, 9 figures, 5 tables)

This paper contains 16 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Tiling (a) and compressor tree (b) of a $6\times 4$-multiplier, composed of four $3\times 2$-tiles.
  • Figure 2: Geometric shapes of rectangular LUT-based tilesBoettcher20
  • Figure 3: Motivational example
  • Figure 4: First 24 Tiles from the efficiency classes above $E_t=$1.0 of the 4$\times$4 search space
  • Figure 5: Design space for Tiles of the highest efficiency
  • ...and 4 more figures