Convergence rates for random feature neural network approximation in molecular dynamics

Xin Huang; Petr Plechac; Mattias Sandberg; Anders Szepessy

Convergence rates for random feature neural network approximation in molecular dynamics

Xin Huang, Petr Plechac, Mattias Sandberg, Anders Szepessy

TL;DR

The proof uses a new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Abstract

Random feature neural network approximations of the potential in Hamiltonian systems yield approximations of molecular dynamics correlation observables that have the expected error $\mathcal{O}\big((K^{-1}+J^{-1/2})^{\frac{1}{2}}\big)$, for networks with $K$ nodes using $J$ data points, provided the Hessians of the potential and the observables are bounded. The loss function is based on the least squares error of the potential and regularizations, with the data points sampled from the Gibbs density. The proof uses an elementary new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Convergence rates for random feature neural network approximation in molecular dynamics

TL;DR

The proof uses a new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Abstract

Random feature neural network approximations of the potential in Hamiltonian systems yield approximations of molecular dynamics correlation observables that have the expected error

, for networks with

nodes using

data points, provided the Hessians of the potential and the observables are bounded. The loss function is based on the least squares error of the potential and regularizations, with the data points sampled from the Gibbs density. The proof uses an elementary new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Paper Structure (15 sections, 2 theorems, 134 equations, 9 figures)

This paper contains 15 sections, 2 theorems, 134 equations, 9 figures.

Neural network approximations of Hamiltonian systems
Overview of main tools for proofs
The global/local error representation for Theorem \ref{['thm']}
The generalization error for the optimization problem \ref{['opt_J']}
The error estimate for the remainder term of \ref{['generalization_error_v1']}
The bound for the training error
Numerical Experiments
The computational model
Generalization error of the trained Fourier network
The approximation of correlation observables
Training data using different temperatures
Optimization of the regularized loss function $\mathcal{L}_{\mathrm{R}}$
Proof of the main theorems
Proof of Theorem \ref{['thm:generalization']}
Proof of Theorem \ref{['thm']}

Key Result

Theorem 1.1

Given the potential $V=\chi_0 V+v_e$, and $v=\chi_1 V$ with the splitting defined by the smooth cut-off functions eq:ve and eq:vi, and denoting $\rho_*$ the optimal density rho_star_optimal, we assume that there exist a constant $C>0$ such that then the observable approximation C_ab_NN, based on the neural network optimization eta_def and opt_J for $\lambda_1=KJ^{-1/2}$, $\lambda_2=K^2J^{-1/2}$ a

Figures (9)

Figure 3.1: The empirical testing loss with increasing number of nodes $K$ of the Fourier neural network, using training data set size $J=10^4$ and $J=10^5$, respectively.
Figure 3.2: Visualization of the target potential function $V(x)$, the reconstructed potential function $\Bar{v}_r(x)$ by the Fourier neural network with number of nodes $K=1024$ and data set size $J=10^5$, along with their pointwise difference.
Figure 3.3: Approximation of the position auto-correlation function $\Bar{\mathcal{C}}_{x_1,x_1}(\tau)$, using Fourier neural network with $K=1024$ nodes, data set size $J=10^{5}$, $\beta=1$, and Monte Carlo sample size $M=2^{21}$.
Figure 3.4: Approximation of the momentum auto-correlation function $\Bar{\mathcal{C}}_{p_1,p_1}(\tau)$, using Fourier neural network with $K=1024$ nodes, data set size $J=10^{5}$, $\beta=1$, and Monte Carlo sample size $M=2^{21}$.
Figure 3.5: The $L^1$-difference of the approximated auto-correlation function $\|\mathcal{C}_{x_1,x_1}(\tau)-\bar{\mathcal{C}}_{x_1,x_1}(\tau)\|_{L^1}$ and $\|\mathcal{C}_{p_1,p_1}(\tau)-\bar{\mathcal{C}}_{p_1,p_1}(\tau)\|_{L^1}$, with increasing number of nodes $K$ in the Fourier neural network. The training data set size $J=10^5$, and the statistical uncertainties are evaluated with $Q=32$ independent replicas.
...and 4 more figures

Theorems & Definitions (8)

Theorem 1.1
Theorem 2.1
proof
Remark 4.1
proof
Remark 4.2
Remark 4.3
Remark 4.4

Convergence rates for random feature neural network approximation in molecular dynamics

TL;DR

Abstract

Convergence rates for random feature neural network approximation in molecular dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)