Convergence and Sketching-Based Efficient Computation of Neural Tangent Kernel Weights in Physics-Based Loss

Max Hirsch; Federico Pichi

Convergence and Sketching-Based Efficient Computation of Neural Tangent Kernel Weights in Physics-Based Loss

Max Hirsch, Federico Pichi

TL;DR

The paper studies convergence of gradient descent in physics-informed neural networks when loss weights are adaptively determined by neural tangent kernel (NTK) analytics, and addresses computational bottlenecks with a randomized predictor-corrector, sketch-based NTK estimator. It proves that, under reasonable assumptions, the average squared residuals $\tfrac{1}{T}\sum_{t=0}^{T-1} \|\mathcal{R}(\theta_t)\|^2$ and the average gradient norms converge to zero, even as the inner product $\Lambda(\theta_t)$ evolves. To enable frequent NTK-based weighting without prohibitive cost, it introduces a fast NTK estimation scheme using matrix sketching; a moving-average scheme further reduces overhead to roughly two extra network evaluations per step, with unbiasedness up to discretization $Δt$. Numerical experiments on a wave-equation PINN and a nonlinear Q-tensor PINN corroborate the theory and demonstrate the practicality of the approach, including favorable comparisons to exact NTK-based weights and FEM baselines.

Abstract

In multi-objective optimization, multiple loss terms are weighted and added together to form a single objective. These weights are chosen to properly balance the competing losses according to some meta-goal. For example, in physics-informed neural networks (PINNs), these weights are often adaptively chosen to improve the network's generalization error. A popular choice of adaptive weights is based on the neural tangent kernel (NTK) of the PINN, which describes the evolution of the network in predictor space during training. The convergence of such an adaptive weighting algorithm is not clear a priori. Moreover, these NTK-based weights would be updated frequently during training, further increasing the computational burden of the learning process. In this paper, we prove that under appropriate conditions, gradient descent enhanced with adaptive NTK-based weights is convergent in a suitable sense. We then address the problem of computational efficiency by developing a randomized algorithm inspired by a predictor-corrector approach and matrix sketching, which produces unbiased estimates of the NTK up to an arbitrarily small discretization error. Finally, we provide numerical experiments to support our theoretical findings and to show the efficacy of our randomized algorithm. Code Availability: https://github.com/maxhirsch/Efficient-NTK

Convergence and Sketching-Based Efficient Computation of Neural Tangent Kernel Weights in Physics-Based Loss

TL;DR

Abstract

Convergence and Sketching-Based Efficient Computation of Neural Tangent Kernel Weights in Physics-Based Loss

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (24)