GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision

Jan Verschelde

GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision

Jan Verschelde

TL;DR

This work tackles the problem of computing accurate Taylor-series expansions of solution coordinates along polynomial homotopies using multiple-double arithmetic, accelerated by GPUs. It combines Newton's method on power-series with linearized systems $A(t){\bf x}(t)=b(t)$, convolutions for evaluation/differentiation, and a blocked least-squares solver, supported by a staged, staggered approach to precision and order. The paper analyzes arithmetic intensity, complex vectorization, and memory layouts to realize GPU throughput, demonstrates teraflop-scale convolution performance on recent NVIDIA GPUs, and shows how precision and problem encoding affect performance and convergence. Publicly available CAMPARY-based implementations, careful kernel inlining, and shared-memory strategies enable scalable experimentation on large systems, with practical implications for Padé construction and locating nearby singularities via Fabry’s criterion. Overall, the results indicate that GPU acceleration can meaningfully offset multiprecision overhead, enabling efficient power-series continuation for large polynomial systems.

Abstract

A polynomial homotopy is a family of polynomial systems, typically in one parameter $t$. Our problem is to compute power series expansions of the coordinates of the solutions in the parameter $t$, accurately, using multiple double arithmetic. One application of this problem is the location of the nearest singular solution in a polynomial homotopy, via the theorem of Fabry. Power series serve as input to construct Padé approximations. Exploiting the massive parallelism of Graphics Processing Units capable of performing several trillions floating-point operations per second, the objective is to compensate for the cost overhead caused by arithmetic with power series in multiple double precision. The application of Newton's method for this problem requires the evaluation and differentiation of polynomials, followed by solving a blocked lower triangular linear system. Experimental results are obtained on NVIDIA GPUs, in particular the RTX 2080, RTX 4080, P100, V100, and A100. Code generated by the CAMPARY software is used to obtain results in double double, quad double, and octo double precision. The programs in this study are self contained, available in a public github repository under the GPL-v3.0 License.

GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision

TL;DR

, convolutions for evaluation/differentiation, and a blocked least-squares solver, supported by a staged, staggered approach to precision and order. The paper analyzes arithmetic intensity, complex vectorization, and memory layouts to realize GPU throughput, demonstrates teraflop-scale convolution performance on recent NVIDIA GPUs, and shows how precision and problem encoding affect performance and convergence. Publicly available CAMPARY-based implementations, careful kernel inlining, and shared-memory strategies enable scalable experimentation on large systems, with practical implications for Padé construction and locating nearby singularities via Fabry’s criterion. Overall, the results indicate that GPU acceleration can meaningfully offset multiprecision overhead, enabling efficient power-series continuation for large polynomial systems.

Abstract

A polynomial homotopy is a family of polynomial systems, typically in one parameter

. Our problem is to compute power series expansions of the coordinates of the solutions in the parameter

, accurately, using multiple double arithmetic. One application of this problem is the location of the nearest singular solution in a polynomial homotopy, via the theorem of Fabry. Power series serve as input to construct Padé approximations. Exploiting the massive parallelism of Graphics Processing Units capable of performing several trillions floating-point operations per second, the objective is to compensate for the cost overhead caused by arithmetic with power series in multiple double precision. The application of Newton's method for this problem requires the evaluation and differentiation of polynomials, followed by solving a blocked lower triangular linear system. Experimental results are obtained on NVIDIA GPUs, in particular the RTX 2080, RTX 4080, P100, V100, and A100. Code generated by the CAMPARY software is used to obtain results in double double, quad double, and octo double precision. The programs in this study are self contained, available in a public github repository under the GPL-v3.0 License.

Paper Structure (22 sections, 1 theorem, 18 equations, 5 figures, 9 tables)

This paper contains 22 sections, 1 theorem, 18 equations, 5 figures, 9 tables.

Introduction
Problem Statement
Multiprecision Arithmetic
Numerical Condition of Taylor Series
Linearized Series and Newton's Method
Columns of Monomials
Staggered Computations
Accelerating Newton's Method
Arithmetic Intensity of Convolutions
Complex Vectorized Arithmetic
Accelerated Least Squares
Accelerated Updates and Residuals
Staging Multiple Doubles
Inlining of Arithmetical Kernels
Shared Memory and Registers
...and 7 more sections

Key Result

Theorem 1.1

If for the series $x(t) = c_0 + c_1 t + c_2 t^2 + \cdots + c_d t^d + c_{d+1} t^{d+1} + \cdots$, we have $\lim_{d \rightarrow \infty} c_d/c_{d+1} = z$, then Then the radius of the disk of convergence is $|z|$.

Figures (5)

Figure 1: Percentage of each type of accelerated computation for a one column monomial system in octo double precision, on V100, with legend in Table \ref{['tab6kernels']}.
Figure 2: Percentage of each type of accelerated computation for a two column monomial system in octo double precision, on V100, with legend in Table \ref{['tab6kernels']}.
Figure 3: Performance in gigaflops on the P100 (top) and on the V100 (bottom) to evaluate and differentiation at series in octo double precision versus the order of the series.
Figure 4: In doubling the precision, the wall clock times on P100 (top) and V100 (bottom) less than double as the proportion of the elapsed times spent by all kernels increases, on one column of 1,024 monomials.
Figure 5: Doubling the precision on the RTX 2080, on one column of 512 monomials.

Theorems & Definitions (1)

Theorem 1.1: the ratio theorem of Fabry Fab1896

GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision

TL;DR

Abstract

GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)