Table of Contents
Fetching ...

Autoregressive model path dependence near Ising criticality

Yi Hong Teoh, Roger G. Melko

TL;DR

This paper studies the reconstruction of critical correlations in the two-dimensional (2D) Ising model, using RNNs and transformers trained on binary spin data obtained near the thermal phase transition and finds that paths with long 1D segments are more efficient at training the autoregressive models compared to space-filling curves that better preserve the 2D locality.

Abstract

Autoregressive models are a class of generative model that probabilistically predict the next output of a sequence based on previous inputs. The autoregressive sequence is by definition one-dimensional (1D), which is natural for language tasks and hence an important component of modern architectures like recurrent neural networks (RNNs) and transformers. However, when language models are used to predict outputs on physical systems that are not intrinsically 1D, the question arises of which choice of autoregressive sequence -- if any -- is optimal. In this paper, we study the reconstruction of critical correlations in the two-dimensional (2D) Ising model, using RNNs and transformers trained on binary spin data obtained near the thermal phase transition. We compare the training performance for a number of different 1D autoregressive sequences imposed on finite-size 2D lattices. We find that paths with long 1D segments are more efficient at training the autoregressive models compared to space-filling curves that better preserve the 2D locality. Our results illustrate the potential importance in choosing the optimal autoregressive sequence ordering when training modern language models for tasks in physics.

Autoregressive model path dependence near Ising criticality

TL;DR

This paper studies the reconstruction of critical correlations in the two-dimensional (2D) Ising model, using RNNs and transformers trained on binary spin data obtained near the thermal phase transition and finds that paths with long 1D segments are more efficient at training the autoregressive models compared to space-filling curves that better preserve the 2D locality.

Abstract

Autoregressive models are a class of generative model that probabilistically predict the next output of a sequence based on previous inputs. The autoregressive sequence is by definition one-dimensional (1D), which is natural for language tasks and hence an important component of modern architectures like recurrent neural networks (RNNs) and transformers. However, when language models are used to predict outputs on physical systems that are not intrinsically 1D, the question arises of which choice of autoregressive sequence -- if any -- is optimal. In this paper, we study the reconstruction of critical correlations in the two-dimensional (2D) Ising model, using RNNs and transformers trained on binary spin data obtained near the thermal phase transition. We compare the training performance for a number of different 1D autoregressive sequences imposed on finite-size 2D lattices. We find that paths with long 1D segments are more efficient at training the autoregressive models compared to space-filling curves that better preserve the 2D locality. Our results illustrate the potential importance in choosing the optimal autoregressive sequence ordering when training modern language models for tasks in physics.
Paper Structure (4 sections, 3 equations, 4 figures)

This paper contains 4 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Autoregressive paths traversing a 2D square lattice. The color represents the parameter $t$ which is varied, from 0 to 1, to generate the paths from their respective parametric equations.
  • Figure 2: Training results for the 2D Ising model near criticality at $\beta = 0.435$ with linear size $L=8$ for (a) 1D RNN and (b) transformer. The critical point of the system is at $\beta_c \approx 0.4407$. We train both model architectures with multiple autoregressive paths (listed in Fig. \ref{['fig:autoregpaths']}). The shaded area represents the standard error over 10 independent training runs. While the transformer is trained with the zigzag and hilbert path, we track the two-point spin-spin correlation function, $\mathcal{G}(\Delta x, \Delta y)$. In (c), we plot the two-point spin-spin correlation for when the number of epochs trained is [61, 211, 241, 3200].
  • Figure 3: Training of the transformer with the zigzag path on the 2D Ising model near criticality ($\beta = 0.435$) with various sizes $L=4,8$ and $16$.
  • Figure 4: Training of the transformer across the 2D Ising model phase transition, $\beta$ between 0.286 and 0.667. The theoretical critical point for this system is at $\beta_c \approx 0.4407$.