Table of Contents
Fetching ...

Scaling Autoregressive Models for Lattice Thermodynamics

Xiaochen Du, Juno Nam, Sulin Liu, Rafael Gómez-Bombarelli

Abstract

Predicting how materials behave under realistic conditions requires understanding the statistical distribution of atomic configurations on crystal lattices, a problem central to alloy design, catalysis, and the study of phase transitions. Traditional Markov-chain Monte Carlo sampling suffers from slow convergence and critical slowing down near phase transitions, motivating the use of generative models that directly learn the thermodynamic distribution. Existing autoregressive models (ARMs), however, generate configurations in a fixed sequential order and incur high memory and training costs, limiting their applicability to realistic systems. Here, we develop a framework combining any-order ARMs, which generate configurations flexibly by conditioning on any known subset of lattice sites, with marginalization models (MAMs), which approximate the probability of any partial configuration in a single forward pass and substantially reduce memory requirements. This combination enables models trained on smaller lattices to be reused for sampling larger systems, while supporting expressive Transformer architectures with lattice-aware positional encodings at manageable computational cost. We demonstrate that Transformer-based any-order MAMs achieve more accurate free energies than multilayer perceptron-based ARMs on both the two-dimensional Ising model and CuAu alloys, faithfully capturing phase transitions and critical behavior. Overall, our framework scales from $10 \times 10$ to $20 \times 20$ Ising systems and from $2 \times 2 \times 4$ to $4 \times 4 \times 8$ CuAu supercells at reduced computational cost compared to conventional sampling methods.

Scaling Autoregressive Models for Lattice Thermodynamics

Abstract

Predicting how materials behave under realistic conditions requires understanding the statistical distribution of atomic configurations on crystal lattices, a problem central to alloy design, catalysis, and the study of phase transitions. Traditional Markov-chain Monte Carlo sampling suffers from slow convergence and critical slowing down near phase transitions, motivating the use of generative models that directly learn the thermodynamic distribution. Existing autoregressive models (ARMs), however, generate configurations in a fixed sequential order and incur high memory and training costs, limiting their applicability to realistic systems. Here, we develop a framework combining any-order ARMs, which generate configurations flexibly by conditioning on any known subset of lattice sites, with marginalization models (MAMs), which approximate the probability of any partial configuration in a single forward pass and substantially reduce memory requirements. This combination enables models trained on smaller lattices to be reused for sampling larger systems, while supporting expressive Transformer architectures with lattice-aware positional encodings at manageable computational cost. We demonstrate that Transformer-based any-order MAMs achieve more accurate free energies than multilayer perceptron-based ARMs on both the two-dimensional Ising model and CuAu alloys, faithfully capturing phase transitions and critical behavior. Overall, our framework scales from to Ising systems and from to CuAu supercells at reduced computational cost compared to conventional sampling methods.
Paper Structure (47 sections, 17 equations, 5 figures)

This paper contains 47 sections, 17 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of autoregressive (ARM) and marginalization-based (MAM) frameworks for scaling generative models of lattice thermodynamics. (a) Schematic of fixed-order ARM, any-order ARM, and any-order MAM. (b) Scaling approaches: out-painting extends smaller core regions; MAMs enable direct training on larger lattices via improved memory efficiency. (c) Schematic illustration of the out-painting procedure.
  • Figure 2: Architectural comparison on the $10 \times 10$ Ising model. (a) Schematic comparison of GNN and Transformer (TFM) architectures applied to lattice sites. (b--c) Free energy per site and deviation from $10\times10$ Wang--Landau reference at $B=0$. Deviations are multiplied by $10^4$ for visual clarity. (d) Spin--spin correlation functions at $T = T_c$ and $B=0$. (e--f) Specific heat capacity per site and deviation from $10\times10$ Wang--Landau reference also at $B=0$.
  • Figure 3: Out-painting and directly-trained results using various ARM MLP and MAM Transformer models on $15\times 15$ and $20\times 20$ Ising lattices at $B = 0$ and across temperatures. (a-b) Free-energy per site deviations from same-sized Wang--Landau reference (with cross-lattice-size comparisons). Values are multiplied by $10^4$ for visual clarity. (c-d) Specific heat capacity per site referenced to same-sized Wang--Landau energies.
  • Figure 4: CuAu alloy results. (a) Cu-Au ordered intermetallic phases observed. From left to right: (1) $\ce{Cu3Au}$, (2) CuAu, and (3) $\ce{CuAu3}$. (b) Free-energy per site comparison between models trained on the $2\times2\times4$ supercell and exact results from enumeration. (c-e) Predicted free-energy per site for the $4\times4\times4$ supercell at specific compositions compared with metadynamics reference. (f-g) Predicted temperature-composition phase diagram for $4\times4\times4$ and $4\times4\times8$ supercells compared with metadynamics reference.
  • Figure 5: Speed comparison between AO-ARM MLPs and AO-MAM Transformers. (a) Number of attempts and wall-clock time per $(T, B)$ condition for various Ising model lattice sizes comparing trained and out-painted samples with MCMC and Wang--Landau reference sampling. Note for Wang--Landau, the wall-clock time is per $B$ condition. (b) Number of attempts and wall-clock time per $(T, \mu)$ condition for various CuAu model lattice sizes comparing trained and out-painted samples with MCMC and metadynamics reference sampling. Note for metadynamics, the wall-clock time is per $T$ condition.