Table of Contents
Fetching ...

Generalized Orders of Magnitude for Scalable, Parallel, High-Dynamic-Range Computation

Franz A. Heinsen, Leo Kozachkov

TL;DR

The paper introduces Generalized Orders of Magnitude (GOOMs), a complex-logarithm–based representation that extends numerical stability to high-dynamic-range computations by ensuring $\exp(x')$ is real. Floating-point numbers are a special case of GOOMs, enabling seamless integration with existing hardware; the authors implement GOOMs in PyTorch and develop a scalable prefix-scan framework for parallel computation. They demonstrate three practical applications—long chains of matrix products, parallel Lyapunov-spectrum estimation, and non-diagonal state-space RNNs—where FP-based methods fail or are impractical, achieving accurate results with markedly reduced computation time. The work combines a formal GOOM theory, mappings to/from FP formats, the log-matrix-multiplication-exp (LMME) operation, and a selective-resetting technique to enable time-parallel analyses, offering a robust, scalable alternative for high-dynamic-range numerical tasks across science and engineering.

Abstract

Many domains, from deep learning to finance, require compounding real numbers over long sequences, often leading to catastrophic numerical underflow or overflow. We introduce generalized orders of magnitude (GOOMs), a principled extension of traditional orders of magnitude that incorporates floating-point numbers as a special case, and which in practice enables stable computation over significantly larger dynamic ranges of real numbers than previously possible. We implement GOOMs, along with an efficient custom parallel prefix scan, to support native execution on parallel hardware such as GPUs. We demonstrate that our implementation of GOOMs outperforms traditional approaches with three representative experiments, all of which were previously considered impractical or impossible, and now become possible and practical: (1) compounding real matrix products far beyond standard floating-point limits; (2) estimating spectra of Lyapunov exponents in parallel, orders of magnitude faster than with previous methods, applying a novel selective-resetting method to prevent state colinearity; and (3) capturing long-range dependencies in deep recurrent neural networks with non-diagonal recurrent states, computed in parallel via a prefix scan, without requiring any form of stabilization. Our results show that our implementation of GOOMs, combined with efficient parallel scanning, offers a scalable and numerically robust alternative to conventional floating-point numbers for high-dynamic-range applications.

Generalized Orders of Magnitude for Scalable, Parallel, High-Dynamic-Range Computation

TL;DR

The paper introduces Generalized Orders of Magnitude (GOOMs), a complex-logarithm–based representation that extends numerical stability to high-dynamic-range computations by ensuring is real. Floating-point numbers are a special case of GOOMs, enabling seamless integration with existing hardware; the authors implement GOOMs in PyTorch and develop a scalable prefix-scan framework for parallel computation. They demonstrate three practical applications—long chains of matrix products, parallel Lyapunov-spectrum estimation, and non-diagonal state-space RNNs—where FP-based methods fail or are impractical, achieving accurate results with markedly reduced computation time. The work combines a formal GOOM theory, mappings to/from FP formats, the log-matrix-multiplication-exp (LMME) operation, and a selective-resetting technique to enable time-parallel analyses, offering a robust, scalable alternative for high-dynamic-range numerical tasks across science and engineering.

Abstract

Many domains, from deep learning to finance, require compounding real numbers over long sequences, often leading to catastrophic numerical underflow or overflow. We introduce generalized orders of magnitude (GOOMs), a principled extension of traditional orders of magnitude that incorporates floating-point numbers as a special case, and which in practice enables stable computation over significantly larger dynamic ranges of real numbers than previously possible. We implement GOOMs, along with an efficient custom parallel prefix scan, to support native execution on parallel hardware such as GPUs. We demonstrate that our implementation of GOOMs outperforms traditional approaches with three representative experiments, all of which were previously considered impractical or impossible, and now become possible and practical: (1) compounding real matrix products far beyond standard floating-point limits; (2) estimating spectra of Lyapunov exponents in parallel, orders of magnitude faster than with previous methods, applying a novel selective-resetting method to prevent state colinearity; and (3) capturing long-range dependencies in deep recurrent neural networks with non-diagonal recurrent states, computed in parallel via a prefix scan, without requiring any form of stabilization. Our results show that our implementation of GOOMs, combined with efficient parallel scanning, offers a scalable and numerically robust alternative to conventional floating-point numbers for high-dynamic-range applications.

Paper Structure

This paper contains 49 sections, 45 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Longest chain multiplying random normal square matrices without catastrophic numerical error on an Nvidia GPU, up to a maximum of 1M steps, with real numbers represented as Float32 and Float64, and with GOOMs represented as Complex64, applying a function we call log-matrix-multiplication-exp, or "$\mathop{\mathrm{LMME}}\limits$" (subsection \ref{['ssec:log_matmul_exp']}). Each point represents the mean of 30 runs, with vertical error bars indicating the standard error across those runs. For every run, at each step, we sample matrix elements independently from $\mathcal{N}(0, 1)$.
  • Figure 2: At the top, we show the range of magnitudes with positive sign representable by Float32 or Float64, up to a maximum $c$, and their approximate share of $n$, the number of possible bit sequences. For Float32, $n = 2^{32}$; for Float64, $n = 2^{64}$. At the bottom, we show the same magnitudes, mapped to a complex GOOM's real component, represented by the same floating-point format. The shares of $n$ are approximate to account for bitwise differences between Float32 and Float64. For magnitudes with negative sign, the diagram is identical.
  • Figure 3: Time to estimate the spectrum of Lyapunov exponents sequentially, as a multiple of time to estimate it in parallel, as we increase the number of steps, for all dynamical systems in a dataset spanning multiple scientific disciplines gilpin2023chaosinterpretablebenchmarkforecastinggilpin2023modelscaleversusdomain, in a single Nvidia GPU. The improvement starts tapering off at $10^5$ steps because the GPU's compute capacity is saturated by parallel QR decompositions at all steps. Appendix \ref{['appendix:lyapunov_execution_time']} shows plots by system.
  • Figure 4: Examples of training dynamics for the RNN we implement, capturing sequential dependencies with non-diagonal recurrences, computed in parallel via a prefix scan, without any form of stabilization. Left: Natural language generation on The Pile gao2020pile800gbdatasetdiverse, with a 124M-parameter RNN incorporating a 50257 token-id vocabulary and 24 layers; we stopped training at 10B tokens. Right: Classification, from last pixel value, of sequences of 784 pixels from MNIST lecun2010mnist, with a 12.8M-parameter RNN incorporating a 256 token-id vocabulary and 8 layers. See our source code for replicating all training runs, including for both tasks shown here.

Theorems & Definitions (2)

  • Example 1: Scalar Multiplication in $\mathbb{R}$ Becomes Addition in ${\mathbb{C}}'$
  • Example 2: Dot Product in $\mathbb{R}$ Becomes log-sum-exp in ${\mathbb{C}}'$