Neural incomplete factorization: learning preconditioners for the conjugate gradient method

Paul Häusner; Ozan Öktem; Jens Sjölund

Neural incomplete factorization: learning preconditioners for the conjugate gradient method

Paul Häusner, Ozan Öktem, Jens Sjölund

TL;DR

A computationally efficient data-driven approach to accelerate the generation of effective preconditioners that replaces the typically hand-engineered preconditioners by the output of graph neural networks and generates an incomplete factorization of the matrix.

Abstract

The convergence of the conjugate gradient method for solving large-scale and sparse linear equation systems depends on the spectral properties of the system matrix, which can be improved by preconditioning. In this paper, we develop a computationally efficient data-driven approach to accelerate the generation of effective preconditioners. We, therefore, replace the typically hand-engineered preconditioners by the output of graph neural networks. Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF). Optimizing the condition number of the linear system directly is computationally infeasible. Instead, we utilize a stochastic approximation of the Frobenius loss which only requires matrix-vector multiplications for efficient training. At the core of our method is a novel message-passing block, inspired by sparse matrix theory, that aligns with the objective of finding a sparse factorization of the matrix. We evaluate our proposed method on both synthetic problem instances and on problems arising from the discretization of the Poisson equation on varying domains. Our experiments show that by using data-driven preconditioners within the conjugate gradient method we are able to speed up the convergence of the iterative procedure. The code is available at https://github.com/paulhausner/neural-incomplete-factorization.

Neural incomplete factorization: learning preconditioners for the conjugate gradient method

TL;DR

Abstract

Paper Structure (52 sections, 12 equations, 8 figures, 5 tables, 2 algorithms)

This paper contains 52 sections, 12 equations, 8 figures, 5 tables, 2 algorithms.

Introduction
Background
Conjugate gradient method
Convergence
Preconditioning
Stopping criterion
Graph neural networks
Method
Learning problem
Scalable training
Model architecture
Additional fill-ins and droppings
Inference and complexity
Results
Synthetic problems
...and 37 more sections

Figures (8)

Figure 1: Different representations of the problem matrix ${\bm{A}}$. The classical linear algebra representation as a matrix (left). The Coates graph representation of the lower-triangular matrix used for the first message-passing step in each block (middle). The second step is executed on the Coates graph corresponding to the upper triangular part of the matrix, which can be obtained by flipping the edges of the lower triangular graph (not shown). The unrolled graph of the message passing resulting in a concatenation of the two directed graph representations used for the message passing in the graph neural network (right). Color is used to visualize edge and matrix element correspondence. Diagonal elements are in bold. The node labels in the graph indicate the corresponding row and column of the matrix.
Figure 2: Total solving time for each test problem instance from the synthetic dataset.
Figure 3: Ordered eigenvalues of the preconditioned linear equation system in log-scale.
Figure 4: Pairwise comparison of total solving times (computation time of the precondition and solving time of the preconditioned linear system) of the NeuralIF preconditioner with the other preconditioners without fill-ins on the 300 large PDE problem instances. Instances towards the lower right part of the plot indicate that our method is faster, otherwise the baseline. Note that there are different axis scales for each comparison.
Figure 5: Comparison of the computational time required for the incomplete Cholesky and NeuralIF preconditioner with respect to the matrix size measure in number of non-zero elements on both the instances from the training distribution and problem instances outside of the training domain. We are using 600 problem instances from the generated Poisson PDE problems. The generated outputs have by construction the same number of non-zero elements as the input matrix.
...and 3 more figures

Neural incomplete factorization: learning preconditioners for the conjugate gradient method

TL;DR

Abstract

Neural incomplete factorization: learning preconditioners for the conjugate gradient method

Authors

TL;DR

Abstract

Table of Contents

Figures (8)