Table of Contents
Fetching ...

Integer Representations in IEEE 754, Posit, and Takum Arithmetics

Laslo Hunhold

TL;DR

This work rigorously analyzes the integral representation capabilities of IEEE 754, OFP8, bfloat16, posits, and takum formats. It provides formal derivations of the minimal bit-length needed to encode arbitrary integers and establishing the largest consecutive representable integers for each format, including two-stage proofs for posits and takums. The results show takum arithmetic consistently matches or surpasses IEEE 754 and posits in consecutive-integer reach, especially at 32, 64, and 128 bits, while retaining backward compatibility with IEEE 754; posits, by contrast, lag at larger widths. Overall, takums emerge as a promising, backward-compatible alternative to IEEE 754 for integer representations, supported by rigorous proofs across multiple sections.

Abstract

Although not primarily designed for this purpose, floating-point numbers are often used to represent integral values, with some applications explicitly relying on this capability. However, the integral representation properties of IEEE 754 floating-point numbers have not yet been formally investigated. Recently, the bfloat16, posit and takum machine number formats have been proposed as alternatives to IEEE 754, while OCP 8-bit floating point (OFP8) types (E4M3 and E5M2) have been introduced as 8-bit extensions of IEEE 754, albeit with slight deviations from the standard. It is therefore timely to evaluate IEEE 754 and to assess how effectively the new formats fulfil this function in comparison with the standard they aim to replace. This paper presents the first rigorous derivations and proofs of the integral representation capabilities of IEEE 754 floating-point numbers, OFP8, bfloat16, posits, and takums. We examine both the exact number of bits required to represent a given integer and the largest consecutive integer representable with a specified bit width. The results show that OFP8 yields mixed outcomes, bfloat16 generally underperforms, and posits fail to scale effectively, whereas takums consistently match or outperform the other formats, maintaining backward compatibility with IEEE 754.

Integer Representations in IEEE 754, Posit, and Takum Arithmetics

TL;DR

This work rigorously analyzes the integral representation capabilities of IEEE 754, OFP8, bfloat16, posits, and takum formats. It provides formal derivations of the minimal bit-length needed to encode arbitrary integers and establishing the largest consecutive representable integers for each format, including two-stage proofs for posits and takums. The results show takum arithmetic consistently matches or surpasses IEEE 754 and posits in consecutive-integer reach, especially at 32, 64, and 128 bits, while retaining backward compatibility with IEEE 754; posits, by contrast, lag at larger widths. Overall, takums emerge as a promising, backward-compatible alternative to IEEE 754 for integer representations, supported by rigorous proofs across multiple sections.

Abstract

Although not primarily designed for this purpose, floating-point numbers are often used to represent integral values, with some applications explicitly relying on this capability. However, the integral representation properties of IEEE 754 floating-point numbers have not yet been formally investigated. Recently, the bfloat16, posit and takum machine number formats have been proposed as alternatives to IEEE 754, while OCP 8-bit floating point (OFP8) types (E4M3 and E5M2) have been introduced as 8-bit extensions of IEEE 754, albeit with slight deviations from the standard. It is therefore timely to evaluate IEEE 754 and to assess how effectively the new formats fulfil this function in comparison with the standard they aim to replace. This paper presents the first rigorous derivations and proofs of the integral representation capabilities of IEEE 754 floating-point numbers, OFP8, bfloat16, posits, and takums. We examine both the exact number of bits required to represent a given integer and the largest consecutive integer representable with a specified bit width. The results show that OFP8 yields mixed outcomes, bfloat16 generally underperforms, and posits fail to scale effectively, whereas takums consistently match or outperform the other formats, maintaining backward compatibility with IEEE 754.
Paper Structure (13 sections, 5 theorems, 39 equations, 2 figures, 1 table)

This paper contains 13 sections, 5 theorems, 39 equations, 2 figures, 1 table.

Key Result

proposition 1

Let an IEEE 754 floating-point format with $n_e$ (explicit) exponent and $n_f$ fraction bits, and $n_e$ sufficiently large for the exponent to assume the value $n_f + 1$. All $m \in \mathbb{Z}$ with are exactly representable in the format.

Figures (2)

  • Figure 1: Non-fraction length (the number of bits used to encode the sign and exponent, including those fraction bits reassigned in the case of subnormal numbers) as a function of the coded exponent for IEEE 754, OFP8, bfloat16, takum, and posit floating-point formats. The plot is inverted, so larger values correspond to more available fraction bits.
  • Figure 2: The largest consecutive integers for IEEE 754 for $n \in \{ 16,32,64,128\}$, OFP8, bfloat16, takums and posits relative to the bit string length $n$.

Theorems & Definitions (12)

  • definition 1: posit encodingposits-beating_floating-point-2017posits-standard-2022
  • definition 2: linear takum encoding 2024-takum
  • proposition 1: Consecutive IEEE 754 Integers
  • proof
  • proposition 2: Posit Integer Representation
  • proof
  • proposition 3: Consecutive Posit Integers
  • proof
  • proposition 4: Takum Integer Representation
  • proof
  • ...and 2 more