Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Tomonori Kouya

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Tomonori Kouya

Abstract

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations. In this study, we achieved benchmark results on x86 and ARM CPU platforms to quantify the accelerations achieved in linear computations and polynomial evaluation by integrating these algorithms.

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Abstract

Paper Structure (12 sections, 7 equations, 4 figures, 4 tables, 14 algorithms)

This paper contains 12 sections, 7 equations, 4 figures, 4 tables, 14 algorithms.

Introduction
Branch-Free Algorithms for DD, TD, and QD Addition and Multiplication
Remarks on double-word (DW) Arithmetic
Remarks on TW Arithmetic
Remarks on QW Arithmetic
Benchmark Tests for Real and Complex Square Matrix Multiplication
Real Matrix Multiplication
Benchmark Tests for Complex Matrix Multiplication
Benchmark Tests for Polynomial Evaluation and Algebraic Equation Solvers
Evaluation of Real-Coefficient Polynomial Functions
Algebraic Equation Solver Based on the DK Method
Conclusion and Future Work

Figures (4)

Figure 1: Computation time (s) of Strassen matrix multiplication: Snapdragon (left), EPYC (right)
Figure 2: Computation time (s) of Strassen matrix multiplication: Snapdragon (left), EPYC (right)
Figure 3: Computation time ($\mu$s) for evaluating a real-coefficient polynomial $p_n(x)$ at real arguments: Snapdragon (top), EPYC (bottom)
Figure 4: Computation time ($\mu$s) for evaluating a real-coefficient polynomial $p_n(x)$ at complex arguments: Snapdragon (top) and EPYC (bottom)

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Abstract

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Authors

Abstract

Table of Contents

Figures (4)