Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

Andreas Böttcher; Martin Kumm

Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

Andreas Böttcher, Martin Kumm

TL;DR

This work tackles the need for efficient small-width multipliers on FPGA for AI inference by replacing conventional rectangular sub-multipliers with incomplete, irregular LUT-based tiles within a multiplier tiling framework. It introduces a design space expansion through a systematic search (restricted to a $4\times4$ board) and leverages truth-table simplification to identify high-efficiency tiles, then uses an ILP-based optimization to jointly select tiles and compressor-tree structures. Empirical results on Kintex-7 show that incomplete tiles reduce LUT usage by up to $17.6\%$ (average $3.7\%$) across sizes up to $16\times16$, with notable gains in dense packing scenarios and competitive performance in CPD and latency. The approach delivers higher arithmetic density for AI workloads while maintaining practical critical-path and latency characteristics, and is broadly applicable to modern FPGA platforms.

Abstract

There is a recent trend in artificial intelligence (AI) inference towards lower precision data formats down to 8 bits and less. As multiplication is the most complex operation in typical inference tasks, there is a large demand for efficient small multipliers. The large DSP blocks have limitations implementing many small multipliers efficiently. Hence, this work proposes a solution for better logic-based multipliers that is especially beneficial for small multipliers. Our work is based on the multiplier tiling method in which a multiplier is designed out of several sub-multiplier tiles. The key observation we made is that these sub-multipliers do not necessarily have to perform a complete (rectangular) NxK multiplication and more efficient sub-multipliers are possible that are incomplete (non-rectangular). This proposal first seeks to identify efficient incomplete irregular sub-multipliers and then demonstrates improvements over state-of-the-art designs. It is shown that optimal solutions can be found using integer linear programming (ILP), which are evaluated in FPGA synthesis experiments.

Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

TL;DR

board) and leverages truth-table simplification to identify high-efficiency tiles, then uses an ILP-based optimization to jointly select tiles and compressor-tree structures. Empirical results on Kintex-7 show that incomplete tiles reduce LUT usage by up to

(average

) across sizes up to

, with notable gains in dense packing scenarios and competitive performance in CPD and latency. The approach delivers higher arithmetic density for AI workloads while maintaining practical critical-path and latency characteristics, and is broadly applicable to modern FPGA platforms.

Abstract

Paper Structure (16 sections, 10 equations, 9 figures, 5 tables)

This paper contains 16 sections, 10 equations, 9 figures, 5 tables.

Introduction
Multiplier Tiling
Conventional Rectangular Multiplier Tiles
Motivational Example
Searching efficient structures
Implementation of the Search
Identified Structures
Selecting the Tile Set for the Optimization
Results
Experimental Setup
Comparison With Previous Tiling
Comparison to Various Small State-of-the-Art Multipliers
Comparison with Previous Truncated Tiling
Packing Experiment
Conclusion
...and 1 more sections

Figures (9)

Figure 1: Tiling (a) and compressor tree (b) of a $6\times 4$-multiplier, composed of four $3\times 2$-tiles.
Figure 2: Geometric shapes of rectangular LUT-based tilesBoettcher20
Figure 3: Motivational example
Figure 4: First 24 Tiles from the efficiency classes above $E_t=$1.0 of the 4$\times$4 search space
Figure 5: Design space for Tiles of the highest efficiency
...and 4 more figures

Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

TL;DR

Abstract

Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs

Authors

TL;DR

Abstract

Table of Contents

Figures (9)