Table of Contents
Fetching ...

Neural Acceleration of Incomplete Cholesky Preconditioners

Joshua Dennis Booth, Hongyang Sun, Trevor Garnett

TL;DR

It is demonstrated that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner.

Abstract

The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally requires some insight into the sparse linear system and the desired trade-off of generating the preconditioner and the reduction in the number of iterations. Incomplete factorization methods tend to be black box methods to generate these preconditioners but may fail for a number of reasons. These reasons include numerical issues that require searching for adequate scaling, shifting, and fill-in while utilizing a difficult to parallelize algorithm. With a move towards heterogeneous computing, many sparse applications find GPUs that are optimized for dense tensor applications like training neural networks being underutilized. In this work, we demonstrate that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner. This generated preconditioner is as good or better in terms of reduction of iterations than the one found using multiple preconditioning techniques such as scaling and shifting. Moreover, the generated method also works and never fails to produce a preconditioner that does not reduce the iteration count.

Neural Acceleration of Incomplete Cholesky Preconditioners

TL;DR

It is demonstrated that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner.

Abstract

The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally requires some insight into the sparse linear system and the desired trade-off of generating the preconditioner and the reduction in the number of iterations. Incomplete factorization methods tend to be black box methods to generate these preconditioners but may fail for a number of reasons. These reasons include numerical issues that require searching for adequate scaling, shifting, and fill-in while utilizing a difficult to parallelize algorithm. With a move towards heterogeneous computing, many sparse applications find GPUs that are optimized for dense tensor applications like training neural networks being underutilized. In this work, we demonstrate that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner. This generated preconditioner is as good or better in terms of reduction of iterations than the one found using multiple preconditioning techniques such as scaling and shifting. Moreover, the generated method also works and never fails to produce a preconditioner that does not reduce the iteration count.
Paper Structure (16 sections, 2 equations, 6 figures, 2 tables)

This paper contains 16 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Neural network representation of $Ax=y$. The input nodes ($x_i$) represent the elements of vector $x$, the output nodes ($y_j$) represent the elements of vector $y$, and the edge weights are taken from nonzero elements of the sparse matrix $A$.
  • Figure 2: Neural network model of $LL^{T}x$. Here, the nonzero pattern (i.e., the edges) is based on the same nonzero pattern of Figure \ref{['fig:Ann']}. However, one hidden layer is added to the product of $L^{T}x$. While the edges themselves are fixed based on the provided pattern, their numerical value will change based on training via backpropagation.
  • Figure 3: Reconstruction of an MNIST image (number nine) as a matrix with increasing number of samples. The first two rows of images provide the visual reconstructions and the bottom figure provides the error in terms of the Frobenius norm of the difference between the original and reconstructed images. We note that it is difficult to even make out the number at fewer than 24 samples and that the error norm only decreases at the point of 28 samples (i.e., the number of samples equals the dimension of the image)
  • Figure 4: Number of iterations to converge to a solution when the sparse matrix is ordered in their natural ordering. The bars represent the raw number of iterations and the lines represent the average iteration for the method across all 24 matrices. In many cases, tradition PCG fails because the incomplete factorization fails. In several cases, even scaling with shifting ShCG fails. The only method that works for all cases while constantly reducing iteration count is the two neural network based methods.
  • Figure 5: Number of iterations to converge to a solution when the sparse matrix is ordered in the RCM ordering. The bars represent the raw number of iterations and the lines represent the average iteration for the method across all 24 matrices. We notice that the ordering does not seem to impact the number of iterations required by our method.
  • ...and 1 more figures