Table of Contents
Fetching ...

Message Passing Variational Autoregressive Network for Solving Intractable Ising Models

Qunlong Ma, Zhi Ma, Jinlong Xu, Hairui Zhang, Ming Gao

TL;DR

The authors propose a variational autoregressive architecture with a message passing mechanism that uses the interactions between spin variables, while previous methods build on the correlations only.

Abstract

Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks. Learning a probability distribution of energy configuration or finding the ground states of a disordered, fully connected Ising model is essential for statistical mechanics and NP-hard problems. Despite tremendous efforts, a neural network architecture with the ability to high-accurately solve these fully connected and extremely intractable problems on larger systems is still lacking. Here we propose a variational autoregressive architecture with a message passing mechanism, which can effectively utilize the interactions between spin variables. The new network trained under an annealing framework outperforms existing methods in solving several prototypical Ising spin Hamiltonians, especially for larger spin systems at low temperatures. The advantages also come from the great mitigation of mode collapse during the training process of deep neural networks. Considering these extremely difficult problems to be solved, our method extends the current computational limits of unsupervised neural networks to solve combinatorial optimization problems.

Message Passing Variational Autoregressive Network for Solving Intractable Ising Models

TL;DR

The authors propose a variational autoregressive architecture with a message passing mechanism that uses the interactions between spin variables, while previous methods build on the correlations only.

Abstract

Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks. Learning a probability distribution of energy configuration or finding the ground states of a disordered, fully connected Ising model is essential for statistical mechanics and NP-hard problems. Despite tremendous efforts, a neural network architecture with the ability to high-accurately solve these fully connected and extremely intractable problems on larger systems is still lacking. Here we propose a variational autoregressive architecture with a message passing mechanism, which can effectively utilize the interactions between spin variables. The new network trained under an annealing framework outperforms existing methods in solving several prototypical Ising spin Hamiltonians, especially for larger spin systems at low temperatures. The advantages also come from the great mitigation of mode collapse during the training process of deep neural networks. Considering these extremely difficult problems to be solved, our method extends the current computational limits of unsupervised neural networks to solve combinatorial optimization problems.
Paper Structure (13 sections, 1 theorem, 30 equations, 14 figures, 3 tables)

This paper contains 13 sections, 1 theorem, 30 equations, 14 figures, 3 tables.

Key Result

Corollary 1

The Hamiltonians message passing process makes $\mathbb{E}_{\textbf{s}\sim q_{\theta}(\textbf{s})}{E}(\textbf{s})$ and $\mathbb{E}_{\textbf{s}\sim q_{\theta}(\textbf{s})}\ln{q_{\theta}(\textbf{s})}$ smaller, and therefore variational free energy $F_q$ smaller compared to no message passing.

Figures (14)

  • Figure 1: The residual energy histogram on the WPE with system size $N=60$ and difficulty parameter $\alpha=0.2$, which makes problem instances hard to solve. The residual energy is defined as the difference between the energy of the configurations sampled directly from the network after training and the energy of the ground state. Each method contains $9\times10^{6}$ configurations obtained from 30 instances and each for 30 runs.
  • Figure 2: Schematic diagram of the network architecture of MPVAN and four autoregressive message passing mechanisms, which are shown on a problem instance with 3 edges and 4 spins. The spins are represented separately with numbers 1 to 4, and node features are represented separately with $h_i, i=1, 2, 3, 4$. (a) The network architecture of MPVAN. The spin configuration $\textbf{s}=\{\pm 1\}^N$ is the input to the network, $\hat{\textbf{s}}$ is the output, and $\textbf{h}$ denotes the hidden layer. The ${\langle \textbf{s} \rangle_{MP}}$ and ${\langle \textbf{h} \rangle_{MP}}$ are updated from $\textbf{s}$ and $\textbf{h}$ by message passing, respectively. The brown solid arrow indicates that neighboring nodes participate in message passing, while the brown dashed arrow indicates that there are connections between neighboring nodes but message passing is not performed to preserve the autoregressive property. The $\{a_{ij}\}$ are the coefficient in message passing process, which vary for different message passing mechanisms. (b) The processes of four autoregressive message passing mechanisms when updating $h_{3}$. Under the MP mechanism used in VAN Wu2019VAN, message passing is not performed, which is equivalent to the identity transformation of ${h_{3}}$. Under the MP mechanism used in GCon-VAN Panfeng2021, message passing performs according to the adjacency matrix $A$, which updates the ${h_{3}}$ based on the topology structure of the graph. For the Graph MP mechanism we designed, message passing is performed by using the couplings $J_{ij}$ of the Hamiltonian, which updates $h_{3}$ based on the couplings and $Z_3=|J_{31}|+|J_{32}|$. The Hamiltonians MP mechanism we designed updates ${h_{3}}$ based on the couplings and values of neighboring spins $s_1$ and $s_2$, which is also the message passing mechanism used in MPVAN.
  • Figure 3: The negative entropy during training when $N_{annealing}=25$ and $N_{training}=100$, on the WPE with $N=30, \alpha=0.2$ and averaging on 10 runs.
  • Figure 4: The residual energy per site of MPVAN with benchmark methods varies with system size $N$. (a) On the WPE, the $\epsilon_{res}/N$ averages on 30 instances and each for 30 runs, all instances with the system size $N$ and $\alpha=0.2$. When $N\geq 50$, the problem instances cannot be solved due to rough energy landscapes. (b) On the SK model, the residual energy per site averages on 30 instances and each for 10 runs. Since the energy of the ground state cannot be determined, we use the lowest energy across MPVAN, VAN, SA, and PT to replace it. Due to computational limitations, we exclude VCA from comparison when $N>100$ as its speed is about $N/10$ times slower than MPVAN when trained under the same hyperparameters. More details regarding computational speed of MPVAN and other methods can be found in Appendix \ref{['appen5']}.
  • Figure 5: On the variants of the SK model, the residual energy per site of MPVAN with benchmark methods varies with average degree of each node in graphs with $N=200$ averaging on 30 randomly generated instances and each for 10 runs.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Corollary 1
  • proof