On the Systematic Creation of Faithfully Rounded Commutative Truncated Booth Multipliers
Theo Drane, Samuel Coward, Mertcan Temel, Joe Leslie-Hurd
TL;DR
The paper addresses faithful rounding in fixed-point multiplication and the non-commutativity introduced by truncating Booth arrays. It proposes a commutative truncated Booth multiplier that inserts a small set of compensation bits and truncates the double Booth-encoded sum, backed by tight bounds on the truncation error $\Delta$ guaranteeing faithful rounding via $C-2^n < \Delta < C+2^k\max(\Delta)-2^k$ with $C$ a multiple of $2^k$. It derives the optimal truncation depth $k^*$ and corresponding constant $C^*$, and provides a step-by-step hardware construction and synthesis results showing up to 31% area and 38% power savings over a baseline, and up to 13% area and 12% power savings over a truncated-AND design. ACL2-based formal verification confirms both faithfulness and commutativity up to $n=42$, though verification for larger widths scales poorly. These results enable smaller, lower-power, faithfully rounded multipliers and pave the way for broader Booth-based truncation schemes.
Abstract
In many instances of fixed-point multiplication, a full precision result is not required. Instead it is sufficient to return a faithfully rounded result. Faithful rounding permits the machine representable number either immediately above or below the full precision result, if the latter is not exactly representable. Multipliers which take full advantage of this freedom can be implemented using less circuit area and consuming less power. The most common implementations internally truncate the partial product array. However, truncation applied to the most common of multiplier architectures, namely Booth architectures, results in non-commutative implementations. The industrial adoption of truncated multipliers is limited by the absence of formal verification of such implementations, since exhaustive simulation is typically infeasible. We present a commutative truncated Booth multiplier architecture and derive closed form necessary and sufficient conditions for faithful rounding. We also provide the bit-vectors giving rise to the worst-case error. We present a formal verification methodology based on ACL2 which scales up to 42 bit multipliers. We synthesize a range of commutative faithfully rounded multipliers and show that truncated booth implementations are up to 31% smaller than externally truncated multipliers.
