Table of Contents
Fetching ...

The autoregressive neural network architecture of the Boltzmann distribution of pairwise interacting spins systems

Indaco Biazzo

TL;DR

The paper addresses sampling Boltzmann distributions for binary pairwise spin systems by deriving an exact autoregressive neural network representation $P_B(\boldsymbol{x})=\prod_i P(x_i|\boldsymbol{x}_{<i})$ (the H2ARNN). The first-layer weights and biases are fixed by the Hamiltonian couplings $J_{ij}$ and fields $h_i$, and the construction reveals residual and recurrent structures that link network architecture to physical interactions. It then demonstrates physics-guided, polynomial-size ARNNs for Curie-Weiss and Sherrington-Kirkpatrick models by leveraging mean-field and replica-symmetry-breaking analyses, achieving superior approximations to Boltzmann distributions compared with standard baselines. The work highlights that the explicit H2ARNN form enables interpretable, problem-specific ARNN design and points to extensions to Potts models and sparse graphs, with practical implications for accelerated sampling and inference in complex systems.

Abstract

Generative Autoregressive Neural Networks (ARNNs) have recently demonstrated exceptional results in image and language generation tasks, contributing to the growing popularity of generative models in both scientific and commercial applications. This work presents an exact mapping of the Boltzmann distribution of binary pairwise interacting systems into autoregressive form. The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields, featuring widely used structures such as the residual connections and a recurrent architecture with clear physical meanings. Moreover, its architecture's explicit formulation enables the use of statistical physics techniques to derive new ARNNs for specific systems. As examples, new effective ARNN architectures are derived from two well-known mean-field systems, the Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performance in approximating the Boltzmann distributions of the corresponding physics model compared to other commonly used architectures. The connection established between the physics of the system and the neural network architecture provides a means to derive new architectures for different interacting systems and interpret existing ones from a physical perspective.

The autoregressive neural network architecture of the Boltzmann distribution of pairwise interacting spins systems

TL;DR

The paper addresses sampling Boltzmann distributions for binary pairwise spin systems by deriving an exact autoregressive neural network representation (the H2ARNN). The first-layer weights and biases are fixed by the Hamiltonian couplings and fields , and the construction reveals residual and recurrent structures that link network architecture to physical interactions. It then demonstrates physics-guided, polynomial-size ARNNs for Curie-Weiss and Sherrington-Kirkpatrick models by leveraging mean-field and replica-symmetry-breaking analyses, achieving superior approximations to Boltzmann distributions compared with standard baselines. The work highlights that the explicit H2ARNN form enables interpretable, problem-specific ARNN design and points to extensions to Potts models and sparse graphs, with practical implications for accelerated sampling and inference in complex systems.

Abstract

Generative Autoregressive Neural Networks (ARNNs) have recently demonstrated exceptional results in image and language generation tasks, contributing to the growing popularity of generative models in both scientific and commercial applications. This work presents an exact mapping of the Boltzmann distribution of binary pairwise interacting systems into autoregressive form. The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields, featuring widely used structures such as the residual connections and a recurrent architecture with clear physical meanings. Moreover, its architecture's explicit formulation enables the use of statistical physics techniques to derive new ARNNs for specific systems. As examples, new effective ARNN architectures are derived from two well-known mean-field systems, the Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performance in approximating the Boltzmann distributions of the corresponding physics model compared to other commonly used architectures. The connection established between the physics of the system and the neural network architecture provides a means to derive new architectures for different interacting systems and interpret existing ones from a physical perspective.
Paper Structure (10 sections, 25 equations, 6 figures)

This paper contains 10 sections, 25 equations, 6 figures.

Figures (6)

  • Figure 1: H$_2$ARNN Architectures of a single Boltzmann conditional probability of a pairwise interacting Hamiltonian, Eq. (\ref{['eq:H2ANN']}.) The $x_{<i}$ variables are the input, the output provides an estimation of the conditional probability $P (x_i=1 | \mathbf{x_{<i}})$. The first layer computes the $x^1_i$ and $x^1_{il}$ variables, see Eq. (\ref{['eq:x_i_first']}), where the weight and bias, directly related to the Hamiltonian parameters, are shown in orange. The non-linear operators are represented by square symbols. The width of the second layer increases exponentially with the system size. The $\log\sum\exp(\mathbf{x})=\log \sum_i e^{x_i}$ represents the set of linear transformations and non-linear activation functions acting on the second layer. The last layer is the sigma function.
  • Figure 2: CW$_N$ and CW$\infty$ architectures of a single conditional probability. Diagrams A and B represent the CW$_N$ and CW$\infty$ architectures, respectively. Both diagrams involve the operation of the sum of the input variables $\mathbf{x}_{<i}$. A skip connection, composed of a shared weight (represented by the orange line), is also present in both cases. In the CW$_N$ architecture, $2(N-1)$ linear operations are applied (with fixed weights and biases, as indicated in Eq. (\ref{['eq:x_i_first']})), followed by two non-linear operations represented by $\log \sum \exp(x)$. On the other hand, in the CW$\infty$ architecture, apart from the skip connection, the input variables undergo a $sgn$ operation before being multiplied by a free weight parameter and passed through the final layer represented by the sigma function. The number of parameters in the CW$_N$ architecture scales as $2N^2$, while in the CW$\infty$ architecture, it scales as $N+1$.
  • Figure 3: SK$\mathbf{_{RS/kRSB}}$ architectures of the single variable conditional probability The diagram depicts the SK$_{RS/kRSB}$ architectures that approximate a single conditional probability of the Boltzmann distribution in the SK model. The input variables are $\mathbf{x}_{<i}$, and the output is the conditional probability $Q^{\text{RS/k-RSB}}\left(x_{i}=1|\mathbf{x}_{<i}\right)$. The non-linear operations are represented by squares and the linear operations by solid lines. The parameters, in the orange lines, are equal to the Hamiltonian parameters and shared among the conditional probabilities, as indicated in Eq. (\ref{['eq:x_i_first']}). The depth of the network is determined by the level of approximation used, with the $Q^{\text{RS}}$ architecture having only one hidden layer and the $Q^{\text{k-SRB}}$ architecture having a sequence of $k+1$ hidden layers. The total number of parameters scales as $2(k+1)N^2 + \mathcal{O}(N)$, where the $RS$ case corresponds to $k=0$.
  • Figure 4: Results for CW model. The CW model considered has $J=1$ and $h=0$ (see the text for details). The system undergoes a second-order phase transition at $\beta=1$ where a spontaneous magnetization appearskadanoff2000statistical. [A1, A2, A3] Relative error in the estimation of the free energy for different system sizes with respect to the analytic solution. The CW$_N$ architecture has its parameters fixed and precomputed analytically, and the error is too small to be seen at this scale. The y-axis is plotted on a logarithmic scale down to $10^{-4}$ and then linearly to zero.[B] The dependence on $N$ of the mean and maximum relative error of the two smaller architectures, $1P$ and CW$_{\infty}$, both of which scale linearly with the size of the system. [C] Distribution of the overlaps of the samples generated by the ARNNs for the CW system with $N=200$ variables and $\beta=1.3$
  • Figure 5: Results for SK model. The SK model considered has $J=1$ and $h=0$ (see the text for details). The system undergoes a phase transition at $\beta=1$10.1142/0271. [A1, A2, A3] Relative difference in the estimation of the free energy for increasing system sizes with respect to the free energy computed by SK$_{2RSB}$ architecture. The results are averaged over 10 instances of the disorder. The y-axis is plotted on a logarithmic scale down to $10^{-4}$ and then linearly to $-10^4$. [B] Scaling with $N$ of the number of parameters of the ARNN architectures. [C] Distribution of the overlaps of the samples generated by the ARNNs architectures for the SK model with $N=200$ variables and $\beta=1.5$, averaged over 10 different instances. The translucent error bands surrounding the plotted lines represent the $95\%$ confidence intervals.
  • ...and 1 more figures