Streamlining SIMD ISA Extensions with Takum Arithmetic: A Case Study on Intel AVX10.2
Laslo Hunhold
TL;DR
The paper investigates using takum arithmetic with tapered precision as a unified numeric core for AVX10.2, aiming to replace the diverse low-precision formats currently supported. Through a large-scale benchmark and methodological reformulations, it demonstrates that takum can provide higher consistency, readability, and extendability, achieving strong empirical viability across 8-, 16-, and 32-bit precisions. It proposes a systematic instruction-grouping and naming scheme to streamline the ISA and shows substantial simplification across bitwise, mask, integer, floating-point, and cryptographic instructions. The work suggests a potential paradigm shift toward a single, efficient number format for both low- and high-precision SIMD computation, with implications for hardware design and cross-architecture vector extensions.
Abstract
Modern microprocessors extend their instruction set architecture (ISA) with Single Instruction, Multiple Data (SIMD) operations to improve performance. The Intel Advanced Vector Extensions (AVX) enhance the x86 ISA and are widely supported in Intel and AMD processors. The latest version, AVX10.2, places a strong emphasis on low-precision, non-standard floating-point formats, including bfloat16 and E4M3/E5M2 float8 (OCP 8-bit Floating Point, OFP8), primarily catering to deep learning applications rather than general-purpose arithmetic. However, as these formats remain within the IEEE 754 framework, they inherit its limitations, introducing inconsistencies and added complexity into the ISA. This paper examines the recently proposed tapered-precision takum floating-point format, which has been shown to offer significant advantages over IEEE 754 and its derivatives as a general-purpose number format. Using AVX10.2 as a case study, the paper explores the potential benefits of replacing the multitude of floating-point formats with takum as a uniform basis. The results indicate a more consistent instruction set, improving readability and flexibility while offering potential for 8- and 16-bit general-purpose SIMD arithmetic.
