Effective Field Neural Network

Xi Liu; Yujun Zhao; Chun Yu Wan; Yang Zhang; Junwei Liu

Effective Field Neural Network

Xi Liu, Yujun Zhao, Chun Yu Wan, Yang Zhang, Junwei Liu

TL;DR

This work demonstrates that EFNNs significantly outperform fully-connected deep neural networks (DNNs) and the effective model and with the help of convolution operations, the EFNNs learned in a small system can be seamlessly used in a larger system without additional training and the relative errors even decrease, which further demonstrates the efficacy of EFNNs in representing core physical behaviors.

Abstract

In recent years, with the rapid development of machine learning, physicists have been exploring its new applications in solving or alleviating the curse of dimensionality in many-body problems. In order to accurately reflect the underlying physics of the problem, domain knowledge must be encoded into the machine learning algorithms. In this work, inspired by field theory, we propose a new set of machine learning models called effective field neural networks (EFNNs) that can automatically and efficiently capture important many-body interactions through multiple self-refining processes. Taking the classical $3$-spin infinite-range model and the quantum double exchange model as case studies, we explicitly demonstrate that EFNNs significantly outperform fully-connected deep neural networks (DNNs) and the effective model. Furthermore, with the help of convolution operations, the EFNNs learned in a small system can be seamlessly used in a larger system without additional training and the relative errors even decrease, which further demonstrates the efficacy of EFNNs in representing core physical behaviors.

Effective Field Neural Network

TL;DR

Abstract

-spin infinite-range model and the quantum double exchange model as case studies, we explicitly demonstrate that EFNNs significantly outperform fully-connected deep neural networks (DNNs) and the effective model. Furthermore, with the help of convolution operations, the EFNNs learned in a small system can be seamlessly used in a larger system without additional training and the relative errors even decrease, which further demonstrates the efficacy of EFNNs in representing core physical behaviors.

Paper Structure

This paper contains 7 equations, 5 figures.

Figures (5)

Figure 1: Energy evaluation of a 2D Ising model reformulated as a neural network. (a) Calculation of the effective field by summing the interacting spins. Each interacting spin is multiplied by its corresponding effective field to obtain independent spin values, and the total energy is determined by summing up these independent spins. (b) Neural network representation of the energy evaluation process. $\odot$ represents the element-wise multiplication.
Figure 2: Performance of EFNNs on a classical 3-spin infinite range model. (a) Architecture of the EFNN. (b) Performance of EFNNs and DNNs on the test set. (c) Performance of EFNNs as the neuron number increases.
Figure 3: Computational workflow of EFNNs with symmetrization. (a) Symmetrization: Each of the three channels of $S_{0}$ ($S_{0,x}$, $S_{0,y}$, $S_{0,z}$) undergoes convolution and element-wise multiplication within the same channel. The resulting products are summed to form the symmetrization layer $T_{j}(S_{0})$. Subsequently, $T_{j}(S_{0})$ is passed through a sequence of layers---convolution, batch normalization, $\tanh$ activation, and another convolution---to produce $F_{1}$ for $j=0$ or $g_{j}(T_{j}(S_{0}))$ for $j=1,\ldots,n$. (b) Generation of quasi-particle layers: The transformed symmetrization layer $g_{j}(T_{j}(S_{0}))$ is element-wise multiplied with the effective field layer $F_{j}$, resulting in the quasi-particle layer $S_{j}$, $j=1,\ldots,n$. (c) Transformation to effective field layers: Each quasi-particle layer $S_{k-1}$ is processed through convolution, batch normalization, $\tanh$ activation, and another convolution to generate the effective field layer $F_{k}$, for $k=2,\ldots,n$. (d) Energy evaluation: The final quasi-particle layer $S_{n}$ is transformed into a single-channel matrix. An element-wise summation of this matrix yields the scalar energy prediction $E$. These images are generated after training the model with $C=10$ channels and $n=2$ layers. Only the first five channels are displayed for clarity, using the 576th sample from the test set as input $S_{0}$.
Figure 4: Relative error of EFNNs. (a) Architecture of EFNN incorporating symmetrization layers $T_{j}$, $j=0,1,\ldots,n$. (b) Comparison of EFNNs' and the effective model's performance on a test set with lattice size $N=10$, for channel numbers $C=10,15,20,25$. (c) Performance of the EFNN model, trained on a $10\times 10$ lattice with $C=20$ channels, when applied to larger lattice systems.
Figure 5: Continued function representation of an EFNN with 3 FP layers. (a) In an EFNN, the initial layer $S_{0}$ is recursively integrated into every quasi-particle layer. After a mapping, $S_{0}$ is multiplied with subsequent layers, forming a continued function representation (see right-hand side equation). (b) A typical architecture of ResNet, only one skip connection starts from $S_{0}$ and summation is used, which limits its renormalization capability. (c) Removing the $S_{0}$ connections transforms an EFNN into a standard DNN, where $S_{0}$ only appears at the beginning of the iterations, resulting in significantly diminished expressive power.