A method for quantifying the generalization capabilities of generative models for solving Ising models

Qunlong Ma; Zhi Ma; Ming Gao

A method for quantifying the generalization capabilities of generative models for solving Ising models

Qunlong Ma, Zhi Ma, Ming Gao

TL;DR

This work tackles the problem of quantifying how well generative models solve Ising models with complex landscapes by introducing a Hamming distance regularizer within the variational autoregressive networks (VAN) framework. The loss is decomposed as $\mathcal{L}=F_q+R_h$, where $F_q$ is the VAN variational free energy and $R_h=\sum_{\mathbf{s}} | hm_{\mathbf{g}}(\mathbf{s})-z|$ governs the overlap with the ground state, enabling a controllable measure of generalization. A Gen metric, $Gen=\sum_{k=0}^{\lfloor N/2 \rfloor} 2^{k} SR_k$, combines overlap size with ground-state success rates to compare network architectures (FNN, RNN, GNN) across WPE and SK models, revealing that often 3-layer FVAN/GVAN offer the best generalization while RVAN tends to weaker generalization due to rapid mode collapse; importantly, small-scale performance predicts large-scale performance, aiding neural architecture search. The approach provides a principled, NAS-oriented framework for selecting VAN architectures capable of scaling to large Ising problems, with potential applicability to other discrete energy landscapes. The results highlight the roles of architecture, depth, and regularization in shaping generalization dynamics of generative solvers.

Abstract

For Ising models with complex energy landscapes, whether the ground state can be found by neural networks depends heavily on the Hamming distance between the training datasets and the ground state. Despite the fact that various recently proposed generative models have shown good performance in solving Ising models, there is no adequate discussion on how to quantify their generalization capabilities. Here we design a Hamming distance regularizer in the framework of a class of generative models, variational autoregressive networks (VAN), to quantify the generalization capabilities of various network architectures combined with VAN. The regularizer can control the size of the overlaps between the ground state and the training datasets generated by networks, which, together with the success rates of finding the ground state, form a quantitative metric to quantify their generalization capabilities. We conduct numerical experiments on several prototypical network architectures combined with VAN, including feed-forward neural networks, recurrent neural networks, and graph neural networks, to quantify their generalization capabilities when solving Ising models. Moreover, considering the fact that the quantification of the generalization capabilities of networks on small-scale problems can be used to predict their relative performance on large-scale problems, our method is of great significance for assisting in the Neural Architecture Search field of searching for the optimal network architectures when solving large-scale Ising models.

A method for quantifying the generalization capabilities of generative models for solving Ising models

TL;DR

, where

is the VAN variational free energy and

governs the overlap with the ground state, enabling a controllable measure of generalization. A Gen metric,

, combines overlap size with ground-state success rates to compare network architectures (FNN, RNN, GNN) across WPE and SK models, revealing that often 3-layer FVAN/GVAN offer the best generalization while RVAN tends to weaker generalization due to rapid mode collapse; importantly, small-scale performance predicts large-scale performance, aiding neural architecture search. The approach provides a principled, NAS-oriented framework for selecting VAN architectures capable of scaling to large Ising problems, with potential applicability to other discrete energy landscapes. The results highlight the roles of architecture, depth, and regularization in shaping generalization dynamics of generative solvers.

Abstract

Paper Structure (5 sections, 6 equations, 7 figures, 2 tables)

This paper contains 5 sections, 6 equations, 7 figures, 2 tables.

INTRODUCTION
THE HAMMING DISTANCE REGULARIZER
NUMERICAL EXPERIMENTS
CONCLUSION AND DISCUSSIONS
More results sampled at different temperatures

Figures (7)

Figure 1: Schematic diagram of Hamming distance and configuration distribution trained with and without the regularizer in the loss function. (a) The changes in average Hamming distance between the ground state and the training datasets, which are directly sampled from networks during training, when with and without the regularizer in the loss function. (b) The spatial distribution of configurations sampled directly from networks at three phases during training, when with and without the regularizer in the loss function.
Figure 2: The success rates of finding the ground state vary with the number of layers on the WPE, with $N=60$ and $\alpha=0.2$. (a) The VAN framework based on FNN. (b) The VAN framework based on RNN. (c) The VAN framework based on GNN.
Figure 3: Scatter plot of the Hamming distance of the samples drawn after training and the ground state, with the X-axis and Y-axis denoting the Hamming distance of the first and last N/2 spins, respectively, when on the first WPE instance in Fig. \ref{['fig2']} with system size $N=60$. (a) The VAN framework based on FNN, when with the Hamming distance regularizer (HDR) and $z=0$ and without the regularizer. (b) The VAN framework based on RNN, when with the Hamming distance regularizer and $z=0$ and without the regularizer. (c) The VAN framework based on GNN, when with the Hamming distance regularizer and $z=0$ and without the regularizer. (d)-(f) The VAN framework based on network architectures in (a)-(c), respectively, when with the Hamming distance regularizer and $z=10$ and without the regularizer. (g)-(i) The VAN framework based on network architectures in (a)-(c), respectively, when with the Hamming distance regularizer and $z=30$ and without the regularizer.
Figure 4: The success rates of finding the ground state vary with the value of $z$. (a) On the WPE, with system size $N=60$ and $\alpha=0.2$. (b) On the SK model, with system size $N=60$.
Figure 5: The success rates of finding the ground state vary with the value of $z$ on models with small system sizes. (a) On the WPE, with system size $N=30$ and $\alpha=0.1$. (b) On the SK model, with the system size $N=30$.
...and 2 more figures

A method for quantifying the generalization capabilities of generative models for solving Ising models

TL;DR

Abstract

A method for quantifying the generalization capabilities of generative models for solving Ising models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)