Table of Contents
Fetching ...

Convergence rates for random feature neural network approximation in molecular dynamics

Xin Huang, Petr Plechac, Mattias Sandberg, Anders Szepessy

TL;DR

The proof uses a new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Abstract

Random feature neural network approximations of the potential in Hamiltonian systems yield approximations of molecular dynamics correlation observables that have the expected error $\mathcal{O}\big((K^{-1}+J^{-1/2})^{\frac{1}{2}}\big)$, for networks with $K$ nodes using $J$ data points, provided the Hessians of the potential and the observables are bounded. The loss function is based on the least squares error of the potential and regularizations, with the data points sampled from the Gibbs density. The proof uses an elementary new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Convergence rates for random feature neural network approximation in molecular dynamics

TL;DR

The proof uses a new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.

Abstract

Random feature neural network approximations of the potential in Hamiltonian systems yield approximations of molecular dynamics correlation observables that have the expected error , for networks with nodes using data points, provided the Hessians of the potential and the observables are bounded. The loss function is based on the least squares error of the potential and regularizations, with the data points sampled from the Gibbs density. The proof uses an elementary new derivation of the generalization error for random feature networks that does not apply the Rademacher or related complexities.
Paper Structure (15 sections, 2 theorems, 134 equations, 9 figures)

This paper contains 15 sections, 2 theorems, 134 equations, 9 figures.

Key Result

Theorem 1.1

Given the potential $V=\chi_0 V+v_e$, and $v=\chi_1 V$ with the splitting defined by the smooth cut-off functions eq:ve and eq:vi, and denoting $\rho_*$ the optimal density rho_star_optimal, we assume that there exist a constant $C>0$ such that then the observable approximation C_ab_NN, based on the neural network optimization eta_def and opt_J for $\lambda_1=KJ^{-1/2}$, $\lambda_2=K^2J^{-1/2}$ a

Figures (9)

  • Figure 3.1: The empirical testing loss with increasing number of nodes $K$ of the Fourier neural network, using training data set size $J=10^4$ and $J=10^5$, respectively.
  • Figure 3.2: Visualization of the target potential function $V(x)$, the reconstructed potential function $\Bar{v}_r(x)$ by the Fourier neural network with number of nodes $K=1024$ and data set size $J=10^5$, along with their pointwise difference.
  • Figure 3.3: Approximation of the position auto-correlation function $\Bar{\mathcal{C}}_{x_1,x_1}(\tau)$, using Fourier neural network with $K=1024$ nodes, data set size $J=10^{5}$, $\beta=1$, and Monte Carlo sample size $M=2^{21}$.
  • Figure 3.4: Approximation of the momentum auto-correlation function $\Bar{\mathcal{C}}_{p_1,p_1}(\tau)$, using Fourier neural network with $K=1024$ nodes, data set size $J=10^{5}$, $\beta=1$, and Monte Carlo sample size $M=2^{21}$.
  • Figure 3.5: The $L^1$-difference of the approximated auto-correlation function $\|\mathcal{C}_{x_1,x_1}(\tau)-\bar{\mathcal{C}}_{x_1,x_1}(\tau)\|_{L^1}$ and $\|\mathcal{C}_{p_1,p_1}(\tau)-\bar{\mathcal{C}}_{p_1,p_1}(\tau)\|_{L^1}$, with increasing number of nodes $K$ in the Fourier neural network. The training data set size $J=10^5$, and the statistical uncertainties are evaluated with $Q=32$ independent replicas.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Theorem 1.1
  • Theorem 2.1
  • proof
  • Remark 4.1
  • proof
  • Remark 4.2
  • Remark 4.3
  • Remark 4.4