GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision
Jan Verschelde
TL;DR
This work tackles the problem of computing accurate Taylor-series expansions of solution coordinates along polynomial homotopies using multiple-double arithmetic, accelerated by GPUs. It combines Newton's method on power-series with linearized systems $A(t){\bf x}(t)=b(t)$, convolutions for evaluation/differentiation, and a blocked least-squares solver, supported by a staged, staggered approach to precision and order. The paper analyzes arithmetic intensity, complex vectorization, and memory layouts to realize GPU throughput, demonstrates teraflop-scale convolution performance on recent NVIDIA GPUs, and shows how precision and problem encoding affect performance and convergence. Publicly available CAMPARY-based implementations, careful kernel inlining, and shared-memory strategies enable scalable experimentation on large systems, with practical implications for Padé construction and locating nearby singularities via Fabry’s criterion. Overall, the results indicate that GPU acceleration can meaningfully offset multiprecision overhead, enabling efficient power-series continuation for large polynomial systems.
Abstract
A polynomial homotopy is a family of polynomial systems, typically in one parameter $t$. Our problem is to compute power series expansions of the coordinates of the solutions in the parameter $t$, accurately, using multiple double arithmetic. One application of this problem is the location of the nearest singular solution in a polynomial homotopy, via the theorem of Fabry. Power series serve as input to construct Padé approximations. Exploiting the massive parallelism of Graphics Processing Units capable of performing several trillions floating-point operations per second, the objective is to compensate for the cost overhead caused by arithmetic with power series in multiple double precision. The application of Newton's method for this problem requires the evaluation and differentiation of polynomials, followed by solving a blocked lower triangular linear system. Experimental results are obtained on NVIDIA GPUs, in particular the RTX 2080, RTX 4080, P100, V100, and A100. Code generated by the CAMPARY software is used to obtain results in double double, quad double, and octo double precision. The programs in this study are self contained, available in a public github repository under the GPL-v3.0 License.
