Multiplier Design Addressing Area-Delay Trade-offs by using DSP and Logic resources on FPGAs

Andreas Böttcher; Martin Kumm

Multiplier Design Addressing Area-Delay Trade-offs by using DSP and Logic resources on FPGAs

Andreas Böttcher, Martin Kumm

TL;DR

The paper tackles efficient FPGA multipliers under DSP-LUT constraints by integrating small Booth-Array sub-multipliers into a multiplier-tiling framework and optimizing tile selection and compressor-tree design via ILP. It extends FloPoCo to incorporate Booth tiles and shows that restricting Booth depth to four levels achieves a favorable balance between area and delay, especially when coupled with DSP-based tiling. The approach yields new Pareto-optimal points and often reduces LUT usage with competitive critical-path delay compared to state-of-the-art logic- and Booth-based designs. Open-source tooling and ILP-based global optimization enable practical, scalable design of medium-to-large multipliers on modern FPGAs.

Abstract

The major challenge when designing multipliers for FPGAs is to address several trade-offs: On the one hand at the performance level and on the other hand at the resource level utilizing DSP blocks or look-up tables (LUTs). With DSPs being a relatively limited resource, the problem of under- or over-utilization of DSPs has previously been addressed by the concept of multiplier tiling, by assembling multipliers from DSPs and small supplemental LUT multipliers. But there had always been an efficiency gap between tiling-based multipliers and radix-4 Booth-Arrays. While the monolithic Booth-Array was shown to be considerably more efficient in terms of LUT-resources on many modern FPGA-architectures, it typically possess a significantly higher critically path delay (or latency when pipelined) compared to multipliers designed by tiling. This work proposes and analyzes the use of smaller Booth-Arrays as sub-multipliers that are integrated into existing tiling-based methods, such that better trade-off points between area and delay can be reached while utilizing a user-specified number of DSP blocks. It is shown by synthesis experiments, that the critical path delay compared to large Booth-Arrays can be reduced, while achieving significant reductions in LUT-resources compared to previous tiling.

Multiplier Design Addressing Area-Delay Trade-offs by using DSP and Logic resources on FPGAs

TL;DR

Abstract

Paper Structure (12 sections, 5 equations, 5 figures, 5 tables)

This paper contains 12 sections, 5 equations, 5 figures, 5 tables.

Introduction
Multiplier Tiling
Utilized Multiplier Tile-Set
Booth-Array Multiplication
Integration of Booth-Arrays into Multiplier Tiling
Evaluation of Cost Relative to Booth-Level
Evaluation of Delay Relative to Booth Level
Results
Experimental Setup
Evaluation of the Impact of the Introduction of Booth-Arrays to Tiling
Comparison to State-of-the-Art Designs
Conclusion

Figures (5)

Figure 1: Tiling (a) and compressor tree (b) of a $24\times 24$-multiplier, composed of four sub-multiplier-tiles.
Figure 2: Slice configuration for Booth multipliers
Figure 3: Overall Structure of a $8\times8$ Booth multiplier
Figure 4: Dependency between size and efficiency
Figure 5: critical path delay vs. complexity

Multiplier Design Addressing Area-Delay Trade-offs by using DSP and Logic resources on FPGAs

TL;DR

Abstract

Multiplier Design Addressing Area-Delay Trade-offs by using DSP and Logic resources on FPGAs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)