Evaluating Numerical Accuracy in Mixed-Precision Computing by Dual-Delta Testing

Peichen Xie

Evaluating Numerical Accuracy in Mixed-Precision Computing by Dual-Delta Testing

Peichen Xie

TL;DR

Dual-Delta Testing addresses the challenge of validating numerical accuracy in mixed-precision computing by replacing a single error delta with two error distributions $\Delta_1$ and $\Delta_2$ relative to a high-precision oracle $f_\Omega$. The method formalizes a mathematical framework, presents an algorithm to compute and compare the distributions, and offers statistical tools (descriptive statistics, visualizations, and hypothesis tests) to determine equivalence or superiority of implementations. Through matrix-multiplication case studies, the approach detects both equivalent accuracy and latent numerical issues, and it validates fixes by restoring distributional parity with the oracle. This methodology provides a robust, generalizable protocol for rigorously assessing numerical accuracy across mixed-precision implementations and hardware platforms.

Abstract

Mixed-precision computing has become increasingly important in modern high-performance computing and machine learning applications. When implementing custom mixed-precision functions -- such as fused operators, optimized GPU kernels, or quantized inference paths -- it is critical to verify their numerical accuracy. Traditional approaches typically compare the custom implementation against a reference using a single error metric. However, this single-delta approach provides limited insight into whether the observed errors are inherent to the precision level or specific to the implementation. This paper introduces \textit{Dual-Delta Testing}, a systematic methodology that evaluates two error distributions against a high-precision oracle, enabling rigorous comparison between a custom implementation and a baseline reference. We present the mathematical framework, algorithmic formulation, statistical analysis techniques, and practical examples demonstrating the methodology's effectiveness in evaluating numerical accuracy.

Evaluating Numerical Accuracy in Mixed-Precision Computing by Dual-Delta Testing

TL;DR

Dual-Delta Testing addresses the challenge of validating numerical accuracy in mixed-precision computing by replacing a single error delta with two error distributions

and

relative to a high-precision oracle

. The method formalizes a mathematical framework, presents an algorithm to compute and compare the distributions, and offers statistical tools (descriptive statistics, visualizations, and hypothesis tests) to determine equivalence or superiority of implementations. Through matrix-multiplication case studies, the approach detects both equivalent accuracy and latent numerical issues, and it validates fixes by restoring distributional parity with the oracle. This methodology provides a robust, generalizable protocol for rigorously assessing numerical accuracy across mixed-precision implementations and hardware platforms.

Abstract

Paper Structure (24 sections, 3 theorems, 4 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 3 theorems, 4 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Motivation
The Problem with Single-Delta Testing
The Solution of Dual-Delta Testing
Mathematical Framework
Notation and Definitions
The Dual-Delta Formulation
Algorithmic Formulation
The Core Algorithm
Implementation Considerations
Input Generation
Oracle Selection
Error Metric Selection
Statistical Analysis
Descriptive Statistics
...and 9 more sections

Key Result

Proposition 1

If the distributions $\Delta_1$ and $\Delta_2$ are statistically indistinguishable, we conclude that $f_1$ and $f_2$ exhibit comparable numerical accuracy relative to the oracle.

Figures (3)

Figure 1: Error distribution comparison for $128 \times 128$ matrix multiplication. The GPU (blue) and CPU (orange) error distributions overlap almost entirely, indicating equivalent numerical accuracy.
Figure 2: Error distribution comparison for $128 \times 4096$ by $4096 \times 128$ matrix multiplication. The GPU (blue) and CPU (orange) error distributions are clearly separated, revealing a significant accuracy degradation in the default GPU implementation.
Figure 3: Error distribution comparison for $128 \times 4096$ by $4096 \times 128$ matrix multiplication after disabling reduced-precision reduction. The GPU (blue) and CPU (orange) error distributions now overlap almost entirely, confirming restored accuracy parity.

Theorems & Definitions (6)

Definition 1: Implementation
Definition 2: Oracle
Definition 3: Error Metric
Proposition 1: Equivalence
Proposition 2: Numerical Accuracy
Proposition 3: Numerical Stability

Evaluating Numerical Accuracy in Mixed-Precision Computing by Dual-Delta Testing

TL;DR

Abstract

Evaluating Numerical Accuracy in Mixed-Precision Computing by Dual-Delta Testing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)