Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer

Georgios Is. Detorakis

Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer

Georgios Is. Detorakis

TL;DR

This work presents a practical primer on solving differential equations with the Deep Galerkin (DG) method, a neural-network–based extension of Galerkin approaches that minimizes a loss combining the differential operator residual, boundary conditions, and initial data. By leveraging automatic differentiation and sampling from domain-specific distributions, the DG framework yields PDE, ODE, and Fredholm integral equation solutions without GPUs, demonstrated on the 1D heat equation, exponential decay, FitzHugh–Nagumo dynamics, and a Fredholm second-kind problem, with explicit performance metrics (e.g., MAEs of approximately 0.0017, 0.0088, and 0.0134, respectively). The paper provides concrete PyTorch implementations, explores architectural variants (MLP, DGM, and ResNet-like blocks), analyzes the effects of batch size and batch normalization on convergence, and demonstrates hyperparameter optimization using Ray Tune and Optuna. Overall, the work offers a practical, end-to-end workflow and guidance for deploying DG-based differential equation solvers on CPU-equipped machines, with a public GitHub repository for reproducibility. This positions the DG method as an accessible option for researchers needing flexible, implementation-friendly PDE/ODE/integral equation solvers in scientific computing.

Abstract

Deep learning has become a popular tool across many scientific fields, including the study of differential equations, particularly partial differential equations. This work introduces the basic principles of deep learning and the Deep Galerkin method, which uses deep neural networks to solve differential equations. This primer aims to provide technical and practical insights into the Deep Galerkin method and its implementation. We demonstrate how to solve the one-dimensional heat equation step-by-step. We also show how to apply the Deep Galerkin method to solve systems of ordinary differential equations and integral equations, such as the Fredholm of the second kind. Additionally, we provide code snippets within the text and the complete source code on Github. The examples are designed so that one can run them on a simple computer without needing a GPU.

Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer

TL;DR

Abstract

Paper Structure (25 sections, 1 theorem, 28 equations, 12 figures, 2 algorithms)

This paper contains 25 sections, 1 theorem, 28 equations, 12 figures, 2 algorithms.

Introduction
Notation & Terminology
Deep Learning
Feed-forward neural network
DGM neural network
Feed-forward ResNet
Neural Network Initialization
Backpropagation, Optimizers and Automatic Differentiation
Vanishing & Exploding Gradients
Batch Normalization
Universal Approximation Theorem
Solving Partial Differential Equations
Finite Differences
Galerkin Methods
Deep Galerkin Method
...and 10 more sections

Key Result

Theorem 1

Let $\sigma$ be any continuous sigmoidal function. Then, the finite sums of the form are dense in $C(I_n)$. In other words, given any $f\in C(I_n)$ and $\epsilon > 0$, there is a sum, $G({\bf x})$ of the above form, for which

Figures (12)

Figure 1: Visual representation of Pytorch tensors. From left to right we see a scalar, $x \in \mathbb{R}$, a one-dimensional tensor (or vector) of dimension $d$, ${\bf x} \in \mathbb{R}^d$, a two-dimensional tensor (or matrix) ${\bf X} \in \mathbb{R}^{n \times m}$, and finally a three-dimensional tensor ${\bf X} \in \mathbb{R}^{n\times m \times d}$. The index starts from zero, following Python's convention.
Figure 2: Neural Network Architectures. A A feed-forward neural network with three hidden layers (gray color), one input (green color), and one output (red color) layer. The connections from the input to the first hidden layer are visible in the graph. B A DG-like neural network with two DG layers (teal color) and two fully connected (linear) layers (gray color). The inset shows the flow of information and the transformations within a DG layer. C A residual block of a ResNet. The input is distributed to the residual mapping (orange and teal colors) and to the output of the block via a skip connection.
Figure 3: Universal Approximation Theorem. A neural network with one hidden layer, three hidden units, and a $\tanh$ activation function approximates the function $f(x)= \sin(3x)$ in the interval $[-1, 1]$. Fifty samples of the function $f(x)$ used to train the neural network are shown here as black dots. The orange line indicates the approximation, $\hat{f}(x)$, provided by the neural network.
Figure 4: Domain and boundary schematics. Left Panel shows a domain $\Omega$ of a partial differential equation in gray color and its boundary $\partial \Omega$ in red color. The right panel shows a rod of size $L$ (gray color), which is used to simulate heat diffusion (see main text). The boundary conditions $y(x, 0)$ and $y(x, L)$ are depicted with red color fonts, and the initial condition $y(x, 0) = f(x) = \cos(x)$ with a blue sinusoidal line.
Figure 5: One-dimensional Finite Differences Scheme for a one-dimensional problem. The spatio-temporal discrete grid appears as circles and and the stencil of the finite differences is shown in orange color. The blue line indicates the initial conditions, and the red one shows the boundary conditions. The blue $j$s and black $i$s reflect the temporal and spatial discrete steps, respectively.
...and 7 more figures

Theorems & Definitions (1)

Theorem 1

Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer

TL;DR

Abstract

Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (1)