Table of Contents
Fetching ...

Performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models

Luca Maria Del Bono, Federico Ricci-Tersenghi, Francesco Zamponi

TL;DR

The work addresses the theoretical understanding of machine-learning-assisted Monte Carlo sampling for simple statistical physics models by focusing on Sequential/Global Annealing with a shallow MADE applied to the Curie-Weiss model. It provides an analytic description of the optimal MADE weights, their thermodynamic-limit behavior, and the gradient-based training dynamics, revealing a critical slowing-down analogue in learning at the critical point. It further benchmarks the Global Annealing procedure against standard local Metropolis Monte Carlo, showing that a perfectly trained MADE can eliminate the need for local moves for small temperature steps, while imperfect training benefits from incorporating local updates, as quantified by first-passage times. The results offer a principled framework for integrating neural-network proposals with traditional MC, informing practical annealing schedules and the trade-offs between training time and sampling efficiency in multi-state systems.

Abstract

Recent years have seen a rise in the application of machine learning techniques to aid the simulation of hard-to-sample systems that cannot be studied using traditional methods. Despite the introduction of many different architectures and procedures, a wide theoretical understanding is still lacking, with the risk of suboptimal implementations. As a first step to address this gap, we provide here a complete analytic study of the widely-used Sequential Tempering procedure applied to a shallow MADE architecture for the Curie-Weiss model. The contribution of this work is twofold: firstly, we give a description of the optimal weights and of the training under Gradient Descent optimization. Secondly, we compare what happens in Sequential Tempering with and without the addition of local Metropolis Monte Carlo steps. We are thus able to give theoretical predictions on the best procedure to apply in this case. This work establishes a clear theoretical basis for the integration of machine learning techniques into Monte Carlo sampling and optimization.

Performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models

TL;DR

The work addresses the theoretical understanding of machine-learning-assisted Monte Carlo sampling for simple statistical physics models by focusing on Sequential/Global Annealing with a shallow MADE applied to the Curie-Weiss model. It provides an analytic description of the optimal MADE weights, their thermodynamic-limit behavior, and the gradient-based training dynamics, revealing a critical slowing-down analogue in learning at the critical point. It further benchmarks the Global Annealing procedure against standard local Metropolis Monte Carlo, showing that a perfectly trained MADE can eliminate the need for local moves for small temperature steps, while imperfect training benefits from incorporating local updates, as quantified by first-passage times. The results offer a principled framework for integrating neural-network proposals with traditional MC, informing practical annealing schedules and the trade-offs between training time and sampling efficiency in multi-state systems.

Abstract

Recent years have seen a rise in the application of machine learning techniques to aid the simulation of hard-to-sample systems that cannot be studied using traditional methods. Despite the introduction of many different architectures and procedures, a wide theoretical understanding is still lacking, with the risk of suboptimal implementations. As a first step to address this gap, we provide here a complete analytic study of the widely-used Sequential Tempering procedure applied to a shallow MADE architecture for the Curie-Weiss model. The contribution of this work is twofold: firstly, we give a description of the optimal weights and of the training under Gradient Descent optimization. Secondly, we compare what happens in Sequential Tempering with and without the addition of local Metropolis Monte Carlo steps. We are thus able to give theoretical predictions on the best procedure to apply in this case. This work establishes a clear theoretical basis for the integration of machine learning techniques into Monte Carlo sampling and optimization.

Paper Structure

This paper contains 23 sections, 41 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Behavior of the optimal couplings $J_\ell^*/\ell$ as a function of $\beta$ for $\ell \leq 10$. (a) Finite $N$, ($N = 20$), obtained solving Eq. \ref{['eq:optimal_weights']}. (b) Infinite $N$, obtained solving Eq. \ref{['eq:optimal_weights_Ninf']}.
  • Figure 2: Comparison at $\beta = \beta_c = 1$ of the approximated couplings $J^\text{app}_\ell$ as found by solving Eq. \ref{['eq:small_Jl']} with the exact ones. (a) $J^\text{app}_\ell$ (dashed) compared with the exact $J^*_\ell$ (full). (b) Absolute error $J^*_\ell-J^\text{app}_\ell$. (c) Relative error $(J^*_\ell-J^\text{app}_\ell)/J^*_\ell$.
  • Figure 3: Comparison between the gradients obtained by linearizing around the optimal solution $J_\ell^*$ (full lines) and the gradients computed using pytorch backpropagation on a large dataset (data points) as a function of the distance from the optimal couplings, $\Delta J_\ell = J_\ell-J^*_\ell$. Details: $N = 20$ spins, $\beta = 1$, the dataset is made of $5 \cdot 10^6$ configurations obtained by starting at infinite temperature and then performing 30 MCS at $\beta = 1$.
  • Figure 4: Comparison between the training of the weights obtained by the approximation in Eq. \ref{['eq:training_solution']} (full lines) and the training performed numerically using pytorch over a large dataset. Details: $N = 200$ spins, $\beta = 1$, the dataset is made of $5 \cdot 10^6$ equilibrium configurations, learning rate $\eta_\ell = 1/[N(\ell-1)]$.
  • Figure 5: Comparison between theory and numerical results for different quantities. The data used come from the same training of Fig. \ref{['fig:comparison_approximation']}. (a) Relative error $|\Delta J_{\ell}|/J_{\ell}$ plotted as a function of $\ell$, together with an exponential fit to the form $Ae^{-\frac{\ell}{\lambda}}$ (dashed black lines). (b,c) The values of the fitted parameters $A$ and $\lambda$ (data points) are compared with those derived from the theory (dashed black lines). (d) The effective timescale $\hat{\tau}_\ell =\tau_\ell/\eta_\ell$, obtained fitting the relative error as $\frac{|\Delta J_\ell|}{J_\ell} = e^{-\frac{t}{\hat{\tau}_\ell}}$, is compared to the prediction from the theory.
  • ...and 5 more figures