Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Yifan Chen; Houman Owhadi; Florian Schäfer

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Yifan Chen, Houman Owhadi, Florian Schäfer

TL;DR

A fast, scalable, and accurate method for solving general PDEs with GPs and kernel methods that integrates sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE.

Abstract

In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general nonlinear PDEs into solving quadratic optimization problems with nonlinear, PDE-induced constraints. However, the complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel, and its \textit{partial derivatives}, a result of the PDE constraint and for which fast algorithms are scarce. The primary goal of this paper is to provide a near-linear complexity algorithm for working with such kernel matrices. We present a sparse Cholesky factorization algorithm for these matrices based on the near-sparsity of the Cholesky factor under a novel ordering of pointwise and derivative measurements. The near-sparsity is rigorously justified by directly connecting the factor to GP regression and exponential decay of basis functions in numerical homogenization. We then employ the Vecchia approximation of GPs, which is optimal in the Kullback-Leibler divergence, to compute the approximate factor. This enables us to compute $ε$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/ε))$ in space and $O(N\log^{2d}(N/ε))$ in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Ampère equations.

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

TL;DR

Abstract

-approximate inverse Cholesky factors of the kernel matrices with complexity

in space and

in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Ampère equations.

Paper Structure (40 sections, 14 theorems, 129 equations, 11 figures, 1 table, 3 algorithms)

This paper contains 40 sections, 14 theorems, 129 equations, 11 figures, 1 table, 3 algorithms.

Introduction
The context
Contributions and organizations
Related work
Machine learning PDEs
Fast solvers for kernel matrices
Screening effects in spatial statistics
Solving nonlinear PDEs via GPs
The GP framework
The finite dimensional problem
The general case
The sparse Cholesky factorization algorithm
The case of derivative-free measurements
Reordering
Sparsity pattern
...and 25 more sections

Key Result

Theorem 4.1

Under the setting in Subsection sec: theory, set-up and the above given ordering $P$, we consider the upper triangular Cholesky factorization $\Theta^{-1}=U^\star {U^\star}^T$. Then, for $1\leq i\leq j \leq N$, we have where $C$ is a generic constant that depends on $\Omega, \delta(\{\mathbf{x}_i\}_{i\in I}; \partial\Omega), d, s, J, \|\mathcal{L}\|, \|\mathcal{L}^{-1}\|$.

Figures (11)

Figure 1: Demonstration of screening effects in the context of Diracs measurements using the Matérn kernel with $\nu=5/2$ and lengthscale $0.3$. The data points are equidistributed in $[0,1]^2$ with grid size $h=0.02$. In the left figure, we display the $1000$th point (the big point) in the maximin ordering with $\mathsf{A} = \emptyset$, where all points ordered before it (i.e., $i<1000$) are colored with intensity according to the corresponding $|U_{ij}^\star|$. The right figure is generated in the same manner but for the $2000$th point in the ordering.
Figure 2: Demonstration of screening effects in the context of derivative-type measurements using the Matérn kernel with $\nu = 5/2$ and lengthscale $0.3$. The data points are equidistributed in $[0,1]^2$ with grid size $h=0.02$. In the left figure, we order the Laplacian measurements first and then select a Diracs measurement which is the big point. The points are colored with intensity according to $|U_{ij}^\star|$. In the right figure, we order the Dircas measurements first and then select a Laplacian measurement; we display things in the same manner as the left figure.
Figure 3: Demonstration of the accuracy of the sparse Cholesky factorization for $K({\bm{\phi}},{\bm{\phi}})^{-1}$ in the nonlinear elliptic PDE example. In the left figure, we choose Matérn kernels with $\nu = 5/2,7/2,9/2$ and lengthscale $l=0.3$; the physical points are fixed to be equidistributed in $[0,1]^2$ with grid size $h=0.05$; we plot the error measured in the KL sense with regard to different $\rho$. In the right figure, we fix the Matérn kernels with $\nu = 5/2$ and lengthscale $l=0.3$. We vary the number of physical points, which are equidistributed with grid size $h=0.04, 0.02, 0.01$; thus $N_{\mathrm{domain}} = 625, 2500, 10000$ correspondingly.
Figure 4: In the left figure, we choose Matérn kernels with $\nu = 5/2,7/2,9/2$ and lengthscale $l=0.3$; the physical points are equidistributed in $[0,1]^2$ with different grid sizes; we plot the CPU time of the factorization algorithm for $K({\bm{\phi}},{\bm{\phi}})^{-1}$, using the personal computer MacBook Pro 2.4 GHz Quad-Core Intel Core i5. In the right figure, we study the sparse Cholesky factorization for the reduced kernel matrix $K({\bm{\phi}}^k, {\bm{\phi}}^k)^{-1}$. We fix the Matérn kernels with $\nu = 5/2$ and lengthscale $l=0.3$. We vary the number of physical points, which are equidistributed with grid size $h=0.04, 0.02, 0.01$. We plot the KL error with regard to different $\rho$.
Figure 5: Demonstration of screening effects for the reduced kernel matrix. We choose the Matérn kernel with $\nu = 5/2$; the lengthscale parameter is $0.3$. The data points are equidistributed in $[0,1]^2$ with grid size $h=0.02$. In the left figure, we show a boundary point, and all the points ordered before are marked with a color whose intensity scales with the entry value $|U_{ij}^\star|$. The right figure is obtained in the same manner but for an interior measurement.
...and 6 more figures

Theorems & Definitions (38)

Remark 2.1
Definition 3.1
Definition 3.2: Conditioned Maximin Ordering
Remark 3.3
Remark 3.4
Remark 3.5
Remark 3.6
Remark 3.7
Remark 3.8
Theorem 4.1
...and 28 more

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

TL;DR

Abstract

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (38)