Evaluation of POSIT Arithmetic with Accelerators

Naohito Nakasato; Yuki Murakami; Fumiya Kono; Maho Nakata

Evaluation of POSIT Arithmetic with Accelerators

Naohito Nakasato, Yuki Murakami, Fumiya Kono, Maho Nakata

TL;DR

This work evaluates 32-bit POSIT arithmetic with the Posit($32$, $2$) format as hardware accelerators on FPGAs and GPUs for dense linear algebra. By extending MPLAPACK to Posit($32$, $2$) and implementing FLO-Posit-based FPGA cores alongside ported GPU kernels, the study demonstrates that POSIT can yield modest accuracy gains in the appropriate input regime and achieves substantial acceleration for GEMM and matrix decompositions. The results show Posit($32$, $2$) provides about $0.5$–$1.0$ extra digits of accuracy than binary32 in the golden zone, with LU and Cholesky decompositions benefiting from acceleration though performance and power characteristics vary across platforms. This work illuminates platform-dependent trade-offs between FPGAs and GPUs for POSIT-based linear algebra and informs future directions for dedicated POSIT hardware and broader arithmetic formats.

Abstract

We present an evaluation of 32-bit POSIT arithmetic through its implementation as accelerators on FPGAs and GPUs. POSIT, a floating-point number format, adaptively changes the size of its fractional part. We developed hardware designs for FPGAs and software for GPUs to accelerate linear algebra operations using Posit(32,2) arithmetic. Our FPGA- and GPU-based accelerators in Posit(32,2) arithmetic significantly accelerated the Cholesky and LU decomposition algorithms for dense matrices. In terms of numerical accuracy, Posit(32,2) arithmetic is approximately 0.5 - 1.0 digits more accurate than the standard 32-bit format, especially when the norm of the elements of the input matrix is close to 1. Evaluating power consumption, we observed that the power efficiency of the accelerators ranged between 0.043 - 0.076 Gflops/watts for the LU decomposition in Posit(32,2) arithmetic. The power efficiency of the latest GPUs as accelerators of Posit(32,2) arithmetic is better than that of the evaluated FPGA chip.

Evaluation of POSIT Arithmetic with Accelerators

TL;DR

This work evaluates 32-bit POSIT arithmetic with the Posit(

) format as hardware accelerators on FPGAs and GPUs for dense linear algebra. By extending MPLAPACK to Posit(

) and implementing FLO-Posit-based FPGA cores alongside ported GPU kernels, the study demonstrates that POSIT can yield modest accuracy gains in the appropriate input regime and achieves substantial acceleration for GEMM and matrix decompositions. The results show Posit(

) provides about

–

extra digits of accuracy than binary32 in the golden zone, with LU and Cholesky decompositions benefiting from acceleration though performance and power characteristics vary across platforms. This work illuminates platform-dependent trade-offs between FPGAs and GPUs for POSIT-based linear algebra and informs future directions for dedicated POSIT hardware and broader arithmetic formats.

Abstract

Paper Structure (19 sections, 5 equations, 8 figures, 6 tables)

This paper contains 19 sections, 5 equations, 8 figures, 6 tables.

Introduction
Posit arithmetic
Details of Accelerator of Linear Algebra in Posit(32,2)
FPGA implementation
GPU implementation
Evaluation of Posit(32,2) arithmetic on FPGAs and GPUs
Evaluation of GEMM on FPGAs
Evaluation and Analysis of Posit(32,2) arithmetic on GPUs
Evaluation of GEMM on GPUs
Comparison of GEMM performance on FPGAs and GPUs
Evaluation of Matrix Decomposition on FPGAs and GPUs
Evaluation of Numerical Error of Matrix Decomposition of Posit(32,2) arithmetic
Performance of Matrix Decomposition
Power Efficiency of the LU decomposition
Discussion
...and 4 more sections

Figures (8)

Figure 1: Bit sequence of POSIT Format
Figure 2: Performance of GEMM on Agilex with $\sigma = 10^{0}, 10^{-2},$ and $10^{6}$ for generating the square matrices.
Figure 3: Performance of GEMM on V100 GPU with different $\sigma$ for generating the square matrices.
Figure 4: Performance of GEMM on five GPUs with $\sigma = 10^{0}$.
Figure 5: Performance of GEMM on V100, RTX3090, RTX4090 and RX7900 with different $P_{\rm limit}$. The performance measured with $P_{\rm limit}$ = 450, 350, 250, 150, and 100 watts are shown as black, white, red, blue, yellow, and green bars, respectively. We set $\sigma = 10^{0}$.
...and 3 more figures

Evaluation of POSIT Arithmetic with Accelerators

TL;DR

Abstract

Evaluation of POSIT Arithmetic with Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (8)