Design and Implementation of a Takum Arithmetic Hardware Codec

Laslo Hunhold

Design and Implementation of a Takum Arithmetic Hardware Codec

Laslo Hunhold

TL;DR

The paper tackles the limitations of existing floating-point and posit formats by introducing a hardware Takum codec for both logarithmic Takums (LNS) and linear Takums, underpinned by a novel internal LNS representation. It provides an efficient, open-source VHDL implementation optimized for FPGA, and a detailed encoder/decoder architecture leveraging a bounded exponent and compact preprocessing to achieve strong hardware efficiency. Empirical results on a Kintex UltraScale+ FPGA show Takum decoders outperform state-of-the-art posit codecs by up to 38% in latency and up to 50% in LUTs, while encoders reach up to 13% lower latency with similar resource use. The work suggests Takums offer practical benefits for mixed-precision numerical computing, with clear directions for VLSI and full-APU integration as future work, including quire considerations and exploration of the chosen base $\sqrt{e}$ in the logarithmic form.

Abstract

The takum machine number format has been recently proposed as an enhancement over the posit number format, which is considered a promising alternative to the IEEE 754 floating-point standard. Takums retain the useful posit properties, but feature a novel exponent coding scheme that yields more precision for small and large magnitude numbers and a much higher and bounded dynamic range. This paper presents the design and implementation of a hardware codec for both takums (logarithmic number system, LNS) and linear takums (floating-point format). The codec design is emphasised, as it constitutes the primary distinguishing feature compared to logarithmic posits (LNS) and posits (floating-point format), which otherwise share similar internal representations. Furthermore, a novel internal representation for LNS is proposed. The presented takum codec, implemented in VHDL, demonstrates near-optimal scalability and performance on an FPGA. It achieves latency reductions of up to 38% and reduces LUT utilisation up to 50% compared to the best state-of-the-art posit codecs.

Design and Implementation of a Takum Arithmetic Hardware Codec

TL;DR

in the logarithmic form.

Abstract

Paper Structure (17 sections, 3 theorems, 16 equations, 4 figures, 2 tables)

This paper contains 17 sections, 3 theorems, 16 equations, 4 figures, 2 tables.

Introduction
Takum Encoding Scheme
Internal Representations
Decoder
Characteristic/Exponent Determinator
Pre-, Logarithmic and Linear Decoder
Encoder
Underflow/Overflow Predictor
Characteristic Precursor Determinator
8-Bit Leading One Detector (LOD)
Extended Takum Generator
Rounder
Post-, Logarithmic and Linear Encoder
Evaluation
Decoder
...and 2 more sections

Key Result

proposition 1

Let $n \in \mathbb{N}_1$ and bit string $T := (\textcolor{sign}{S},\textcolor{direction}{D},\textcolor{regime}{R}, \textcolor{characteristic}{C},\textcolor{mantissa}{M}) \in {\{0,1\}}^n$ as in Definition def:takum with $\mathop{\mathrm{\tau}}\nolimits((\textcolor{sign}{S},\textcolor{direction}{D},

Figures (4)

Figure 1: The logic circuit of the predecoder, largely separated into three main entities: the regime/antiregime determinator (E1), the characteristic/exponent determinator (E2) and the special case detector (E3). We assume $n \ge 12$ (thus omitting optional zero-expansion of $\mathit{takum}$ at the beginning for $n < 12$) for simplicity; the implemented predecoder works for any $n \ge 2$. We also assume an enabled $\mathit{output\_exponent}$, as disabling it would only flip the top MUX in E2. Vertical dashed lines indicate where the strands of a multi-signal are split up or combined.
Figure 2: The logic circuit of the postencoder, largely separated into five main entities: the underflow/overflow predictor (E1), the characteristic precursor determinator (E2), the extended takum generator (E3), the rounder (E4) and the output driver (E5). We assume $n \ge 12$ (thus omitting special case handling in the underflow/overflow predictor for $n < 12$) for simplicity; the implemented postencoder works for any $n \ge 2$. Vertical dashed lines indicate where the strands of a multi-signal are split up or combined.
Figure 3: Evaluation results for the decoder in terms of latency and LUT consumption.
Figure 4: Evaluation results for the encoder in terms of latency and LUT consumption.

Theorems & Definitions (8)

definition 1: takum encoding 2024-takum
definition 2: linear takum encoding 2024-takum
proposition 1: characteristic complement
proof
corollary 1: conditional characteristic complement
proof
proposition 2: characteristic precursor
proof

Design and Implementation of a Takum Arithmetic Hardware Codec

TL;DR

Abstract

Design and Implementation of a Takum Arithmetic Hardware Codec

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)