Evaluation of Posits for Spectral Analysis Using a Software-Defined Dataflow Architecture

Sameer Deshmukh; Daniel Khankin; William Killian; John Gustafson; Elad Raz

Evaluation of Posits for Spectral Analysis Using a Software-Defined Dataflow Architecture

Sameer Deshmukh, Daniel Khankin, William Killian, John Gustafson, Elad Raz

TL;DR

This study addresses the accuracy and performance of the posit32 number format for spectral analysis, using a software-defined dataflow architecture that maps algorithms to a DAG of integer operations with no FPU. The authors demonstrate that posit32 yields substantially higher accuracy than float32 for FFT and spectral-method computations, while achieving near-IEEE 754 performance on their dataflow platform (within about $1.82\times$ for $2^{28}$ points) and far outperforming software-based CPU comparisons (up to $69.27\times$ slower on CPU). They provide a fair hardware-based comparison by emulating both formats with integer operations on a non-von Neumann, DAG-mapped architecture, and report higher LE counts and power for posit32 due to decoding/encoding overhead. Collectively, the work establishes a new lower bound on posit performance relative to IEEE 754 and suggests that posits can be a competitive alternative in memory-bound spectral analysis when implemented in a flexible dataflow fabric, with further gains expected from optimization of posit encoding/decoding. $2\times$ accuracy gains for FFT and spectral methods, $1.82\times$ relative FFT performance gap on the dataflow, and $69.27\times$ CPU gap highlight the potential of posits for high-precision scientific computing on reconfigurable hardware.

Abstract

Spectral analysis plays an important role in detection of damage in structures and deep learning. The choice of a floating-point format plays a crucial role in determining the accuracy and performance of spectral analysis. The IEEE Std 754\textsuperscript{TM} floating-point format (IEEE~754 for short) is supported by most major hardware vendors for ``normal'' floats. However, it has several limitations. Previous work has attempted to evaluate posit format with respect to accuracy and performance. The accuracy of the posit has been established over IEEE~754 for a variety of applications. For example, our analysis of the Fast Fourier Transform shows 2x better accuracy when using a 32-bit posit vs. a 32-bit IEEE754 format. For spectral analysis, 32-bit posits are substantially more accurate than 32-bit IEEE~754 floats. Although posit has shown better accuracy than IEEE~754, a fair evaluation of posit with IEEE~754 format using a real hardware implementation has been lacking so far. A software simulation of posit format on an x86 CPU is about $\mathbf{69.3\times}$ slower than native IEEE~754 hardware for normal floats for a Fast Fourier Transform (FFT) of $\mathbf{2^{28}}$ points. We propose the use of a software-defined dataflow architecture to evaluate performance and accuracy of posits in spectral analysis. Our dataflow architecture uses reconfigurable logical elements that express algorithms using only integer operations. Our architecture does not have an FPU, and we express both IEEE~754 and posit arithmetic using the same integer operations within the hardware. On our dataflow architecture, the posit format is only $\mathbf{1.8\times}$ slower than IEEE~754 for a Fast Fourier Transform (FFT) of $\mathbf{2^{28}\approx 268}$ million points. With this implementation, we empirically propose a new lower bound for the performance of posit compared to IEEE~754 format.

Evaluation of Posits for Spectral Analysis Using a Software-Defined Dataflow Architecture

TL;DR

for

points) and far outperforming software-based CPU comparisons (up to

slower on CPU). They provide a fair hardware-based comparison by emulating both formats with integer operations on a non-von Neumann, DAG-mapped architecture, and report higher LE counts and power for posit32 due to decoding/encoding overhead. Collectively, the work establishes a new lower bound on posit performance relative to IEEE 754 and suggests that posits can be a competitive alternative in memory-bound spectral analysis when implemented in a flexible dataflow fabric, with further gains expected from optimization of posit encoding/decoding.

accuracy gains for FFT and spectral methods,

relative FFT performance gap on the dataflow, and

CPU gap highlight the potential of posits for high-precision scientific computing on reconfigurable hardware.

Abstract

slower than native IEEE~754 hardware for normal floats for a Fast Fourier Transform (FFT) of

points. We propose the use of a software-defined dataflow architecture to evaluate performance and accuracy of posits in spectral analysis. Our dataflow architecture uses reconfigurable logical elements that express algorithms using only integer operations. Our architecture does not have an FPU, and we express both IEEE~754 and posit arithmetic using the same integer operations within the hardware. On our dataflow architecture, the posit format is only

slower than IEEE~754 for a Fast Fourier Transform (FFT) of

million points. With this implementation, we empirically propose a new lower bound for the performance of posit compared to IEEE~754 format.

Paper Structure (19 sections, 4 equations, 10 figures, 6 tables, 3 algorithms)

This paper contains 19 sections, 4 equations, 10 figures, 6 tables, 3 algorithms.

Introduction
IEEE 754 Floating-Point Numbers
The Posit Specification
Posit algorithms
Dataflow Architecture
Comparison of a dataflow architecture with a CPU
Expression of a DAG on our dataflow architecture
Memory access design
Threading model
Results
Analysis of accuracy of posit32 vs. float32
Accuracy analysis of the FFT
Accuracy analysis of the spectral method
Evaluation of posit32 and float32 on our dataflow architecture
Performance of the FFT
...and 4 more sections

Figures (10)

Figure 1: Bit sequence of a 32-bit IEEE 754 float. Unlike posits, IEEE floats use a fixed number of bits for the exponent and fraction.
Figure 2: Posit32 with varying number of bits for the regime, exponent, and fraction. The uppermost diagram uses all the $27$ bits available for the fraction. The middle diagram has more bits for the regime, but fewer fraction bits. The lowermost diagram maximizes the use of the regime bits.
Figure 3: Comparison of the operations performed with floats on the left vs. posits on the right using a general-purpose CPU for addition. CPUs have dedicated hardware for normal IEEE 754 floats whereas posits are emulated on integer hardware.
Figure 4: Comparison of addition of float32 on the left and posit32 on the right on a software-controlled dataflow architecture. The computation is projected onto the hardware using an optimizing compiler and hardware projection software stack, and then executed as per the flow of data.
Figure 5: Multi-threaded execution on a general-purpose CPU (left) vs. our software-defined dataflow architecture (right).
...and 5 more figures

Evaluation of Posits for Spectral Analysis Using a Software-Defined Dataflow Architecture

TL;DR

Abstract

Evaluation of Posits for Spectral Analysis Using a Software-Defined Dataflow Architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (10)