Table of Contents
Fetching ...

A Mini-Batch Method for Solving Nonlinear PDEs with Gaussian Processes

Xianjin Yang, Houman Owhadi

TL;DR

This work introduces a mini-batch framework to solve nonlinear PDEs with Gaussian Processes, addressing the cubic $O(N^3)$ cost of exact GP inversions by performing updates on small minibatches with a slack variable $\boldsymbol{z}$. The formulation yields a finite-dimensional representer expression and an $O(M^3)$ per-iteration cost, with convergence to a near-stationary point governed by a rate $O\left(\frac{1}{K}+\frac{1}{M}\right)$ under stability and weak convexity. Numerical experiments on a nonlinear elliptic PDE and Burgers' equation demonstrate that the method achieves accuracy comparable to full GP solvers while significantly reducing computation when using appropriately sized minibatches. Overall, the approach blends GP/regression theory in RKHS with stochastic proximal optimization to produce scalable, principled solvers for nonlinear PDEs with uncertainty quantification potential.

Abstract

Gaussian processes (GPs) based methods for solving partial differential equations (PDEs) demonstrate great promise by bridging the gap between the theoretical rigor of traditional numerical algorithms and the flexible design of machine learning solvers. The main bottleneck of GP methods lies in the inversion of a covariance matrix, whose cost grows cubically concerning the size of samples. Drawing inspiration from neural networks, we propose a mini-batch algorithm combined with GPs to solve nonlinear PDEs. A naive deployment of a stochastic gradient descent method for solving PDEs with GPs is challenging, as the objective function in the requisite minimization problem cannot be depicted as the expectation of a finite-dimensional random function. To address this issue, we employ a mini-batch method to the corresponding infinite-dimensional minimization problem over function spaces. The algorithm takes a mini-batch of samples at each step to update the GP model. Thus, the computational cost is allotted to each iteration. Using stability analysis and convexity arguments, we show that the mini-batch method steadily reduces a natural measure of errors towards zero at the rate of $O(1/K+1/M)$, where $K$ is the number of iterations and $M$ is the batch size.

A Mini-Batch Method for Solving Nonlinear PDEs with Gaussian Processes

TL;DR

This work introduces a mini-batch framework to solve nonlinear PDEs with Gaussian Processes, addressing the cubic cost of exact GP inversions by performing updates on small minibatches with a slack variable . The formulation yields a finite-dimensional representer expression and an per-iteration cost, with convergence to a near-stationary point governed by a rate under stability and weak convexity. Numerical experiments on a nonlinear elliptic PDE and Burgers' equation demonstrate that the method achieves accuracy comparable to full GP solvers while significantly reducing computation when using appropriately sized minibatches. Overall, the approach blends GP/regression theory in RKHS with stochastic proximal optimization to produce scalable, principled solvers for nonlinear PDEs with uncertainty quantification potential.

Abstract

Gaussian processes (GPs) based methods for solving partial differential equations (PDEs) demonstrate great promise by bridging the gap between the theoretical rigor of traditional numerical algorithms and the flexible design of machine learning solvers. The main bottleneck of GP methods lies in the inversion of a covariance matrix, whose cost grows cubically concerning the size of samples. Drawing inspiration from neural networks, we propose a mini-batch algorithm combined with GPs to solve nonlinear PDEs. A naive deployment of a stochastic gradient descent method for solving PDEs with GPs is challenging, as the objective function in the requisite minimization problem cannot be depicted as the expectation of a finite-dimensional random function. To address this issue, we employ a mini-batch method to the corresponding infinite-dimensional minimization problem over function spaces. The algorithm takes a mini-batch of samples at each step to update the GP model. Thus, the computational cost is allotted to each iteration. Using stability analysis and convexity arguments, we show that the mini-batch method steadily reduces a natural measure of errors towards zero at the rate of , where is the number of iterations and is the batch size.
Paper Structure (13 sections, 10 theorems, 68 equations, 3 figures, 1 algorithm)

This paper contains 13 sections, 10 theorems, 68 equations, 3 figures, 1 algorithm.

Key Result

Lemma 2.4

Let $\mathcal{N}=\{1, \dots, N\}$ and $\mathcal{I}\subset \mathcal{N}$. Let $\boldsymbol{\phi}$ be as in nlprob and let $\boldsymbol{\phi}_{\mathcal{I}}$ be the subset of $\boldsymbol{\phi}$ indexed by $\mathcal{I}$. Denote by $\ell^2(\mathcal{I})$ the set of sequences $\boldsymbol{a}=(a_{i})_{i \in Furthermore,

Figures (3)

  • Figure 1: Nonlinear elliptic equation: results for the mini-batch GP method using $64$ points in each batch. The parameters $\eta=10^{-13}$. (a): a set of sample points and the contour of the true solution; (b): The convergence history, averaged across the subsequent 10 iterations (semi-log scale in $y$ axis); (c): the graph of the true solution $u^*$; (d): the numerical solution $u_{\text{MGP}}$ of the mini-batch GP method; (e): the contour of point-wise errors for our mini-batch GP method; (f): the contour of point-wise errors of the GP method chen2021solving.
  • Figure 2: Burgers' equation: results for the mini-batch GP method using $75$ points in each batch. The parameters $\eta=10^{-10}$, $\gamma=1$. (a): a set of sample points and the contour of the true solution; (b): the convergence history (semi-log scale in $y$ axis); (c): the graph of the true solution $u^*$; (d): the numerical solution $u_{\text{MGP}}$ of the mini-batch GP method; (e): the contour of point-wise errors.
  • Figure 3: Burgers' equation: averaged results over $10$ realizations for the mini-batch GP method using different batch sizes. The parameters $\eta=10^{-10}$, $\gamma=1$. (a): averaged loss histories (semi-log scale in $y$ axis); (b): the averaged point-wise errors when $M=75$; (c): the averaged point-wise errors when $M=150$; (e): the averaged point-wise errors when $M=300$.

Theorems & Definitions (25)

  • Remark 2.1
  • Remark 2.2
  • Remark 2.3
  • Lemma 2.4
  • Lemma 2.5
  • Remark 2.6
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • ...and 15 more