Table of Contents
Fetching ...

Sequential-in-time training of nonlinear parametrizations for solving time-dependent partial differential equations

Huan Zhang, Yifan Chen, Eric Vanden-Eijnden, Benjamin Peherstorfer

TL;DR

The paper establishes a unifying framework for sequential-in-time training of nonlinear parametrizations in time-dependent PDEs by separating OtD (optimize-then-discretize) and DtO (discretize-then-optimize) schemes. It provides a posteriori error and stability analyses, highlights the tangent-space collapse phenomenon in OtD, and shows that DtO schemes are robust to this collapse at the cost of solving more challenging, nonconvex optimizations. A key insight is that OtD dynamics project the PDE onto the parametrization manifold, connecting to Dirac-Frenkel variational principles, while DtO treats time discretization first, leading to boundary-value-like optimization steps; under one-step Gauss-Newton, OtD approximates DtO to first order. The authors further relate OtD to gradient flows and natural gradient descent, showing that metric choices influence convergence properties and suggesting directions for designing efficient algorithms that leverage these geometric interpretations. Overall, the work clarifies how these two broad strategies interact, informs practical algorithm design, and opens avenues for integrating OtD and DtO ideas with gradient-flow and information-geometric perspectives.

Abstract

Sequential-in-time methods solve a sequence of training problems to fit nonlinear parametrizations such as neural networks to approximate solution trajectories of partial differential equations over time. This work shows that sequential-in-time training methods can be understood broadly as either optimize-then-discretize (OtD) or discretize-then-optimize (DtO) schemes, which are well known concepts in numerical analysis. The unifying perspective leads to novel stability and a posteriori error analysis results that provide insights into theoretical and numerical aspects that are inherent to either OtD or DtO schemes such as the tangent space collapse phenomenon, which is a form of over-fitting. Additionally, the unified perspective facilitates establishing connections between variants of sequential-in-time training methods, which is demonstrated by identifying natural gradient descent methods on energy functionals as OtD schemes applied to the corresponding gradient flows.

Sequential-in-time training of nonlinear parametrizations for solving time-dependent partial differential equations

TL;DR

The paper establishes a unifying framework for sequential-in-time training of nonlinear parametrizations in time-dependent PDEs by separating OtD (optimize-then-discretize) and DtO (discretize-then-optimize) schemes. It provides a posteriori error and stability analyses, highlights the tangent-space collapse phenomenon in OtD, and shows that DtO schemes are robust to this collapse at the cost of solving more challenging, nonconvex optimizations. A key insight is that OtD dynamics project the PDE onto the parametrization manifold, connecting to Dirac-Frenkel variational principles, while DtO treats time discretization first, leading to boundary-value-like optimization steps; under one-step Gauss-Newton, OtD approximates DtO to first order. The authors further relate OtD to gradient flows and natural gradient descent, showing that metric choices influence convergence properties and suggesting directions for designing efficient algorithms that leverage these geometric interpretations. Overall, the work clarifies how these two broad strategies interact, informs practical algorithm design, and opens avenues for integrating OtD and DtO ideas with gradient-flow and information-geometric perspectives.

Abstract

Sequential-in-time methods solve a sequence of training problems to fit nonlinear parametrizations such as neural networks to approximate solution trajectories of partial differential equations over time. This work shows that sequential-in-time training methods can be understood broadly as either optimize-then-discretize (OtD) or discretize-then-optimize (DtO) schemes, which are well known concepts in numerical analysis. The unifying perspective leads to novel stability and a posteriori error analysis results that provide insights into theoretical and numerical aspects that are inherent to either OtD or DtO schemes such as the tangent space collapse phenomenon, which is a form of over-fitting. Additionally, the unified perspective facilitates establishing connections between variants of sequential-in-time training methods, which is demonstrated by identifying natural gradient descent methods on energy functionals as OtD schemes applied to the corresponding gradient flows.
Paper Structure (40 sections, 9 theorems, 70 equations)

This paper contains 40 sections, 9 theorems, 70 equations.

Key Result

Proposition 1

(See 9073ba01-c8c8-3f30-b15c-e4b52a44e9da.) Consider the time-dependent PDE eq:Prelim:PDE and let $\boldsymbol{\theta}(t)$ solve the continuous OtD dynamics eq:projected_dynamics so that $\hat{u}({\boldsymbol{\theta}}(t),\cdot)$ approximates $u$. Assume that there exists a non-negative constant $C$ Furthermore, assume that there exists a function $\varepsilon: [0,T]\to [0,\infty)$ so that Then,

Theorems & Definitions (18)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • ...and 8 more