Table of Contents
Fetching ...

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Tomonori Kouya

Abstract

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations. In this study, we achieved benchmark results on x86 and ARM CPU platforms to quantify the accelerations achieved in linear computations and polynomial evaluation by integrating these algorithms.

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Abstract

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations. In this study, we achieved benchmark results on x86 and ARM CPU platforms to quantify the accelerations achieved in linear computations and polynomial evaluation by integrating these algorithms.
Paper Structure (12 sections, 7 equations, 4 figures, 4 tables, 14 algorithms)

This paper contains 12 sections, 7 equations, 4 figures, 4 tables, 14 algorithms.

Figures (4)

  • Figure 1: Computation time (s) of Strassen matrix multiplication: Snapdragon (left), EPYC (right)
  • Figure 2: Computation time (s) of Strassen matrix multiplication: Snapdragon (left), EPYC (right)
  • Figure 3: Computation time ($\mu$s) for evaluating a real-coefficient polynomial $p_n(x)$ at real arguments: Snapdragon (top), EPYC (bottom)
  • Figure 4: Computation time ($\mu$s) for evaluating a real-coefficient polynomial $p_n(x)$ at complex arguments: Snapdragon (top) and EPYC (bottom)