Optimizing p-spin models through hypergraph neural networks and deep reinforcement learning

Li Zeng; Mutian Shen; Tianle Pu; Zohar Nussinov; Qing Feng; Chao Chen; Zhong Liu; Changjun Fan

Optimizing p-spin models through hypergraph neural networks and deep reinforcement learning

Li Zeng, Mutian Shen, Tianle Pu, Zohar Nussinov, Qing Feng, Chao Chen, Zhong Liu, Changjun Fan

TL;DR

PLANCK is introduced, a physics-inspired deep reinforcement learning framework built on hypergraph neural networks that provides a physics-inspired algorithmic paradigm that bridges statistical mechanics and reinforcement learning and achieves near-optimal solutions for a broad class of NP-hard combinatorial problems.

Abstract

p-spin glasses, characterized by frustrated many-body interactions beyond the conventional pairwise case (p>2), are prototypical disordered systems whose ground-state search is NP-hard and computationally prohibitive for large instances. Solving this problem is not only fundamental for understanding high-order disorder, structural glasses, and topological phases, but also central to a wide spectrum of hard combinatorial optimization tasks. Despite decades of progress, there still lacks an efficient and scalable solver for generic large-scale p-spin models. Here we introduce PLANCK, a physics-inspired deep reinforcement learning framework built on hypergraph neural networks. PLANCK directly optimizes arbitrary high-order interactions, and systematically exploits gauge symmetry throughout both training and inference. Trained exclusively on small synthetic instances, PLANCK exhibits strong zero-shot generalization to systems orders of magnitude larger, and consistently outperforms state-of-the-art thermal annealing methods across all tested structural topologies and coupling distributions. Moreover, without any modification, PLANCK achieves near-optimal solutions for a broad class of NP-hard combinatorial problems, including random k-XORSAT, hypergraph max-cut, and conventional max-cut. The presented framework provides a physics-inspired algorithmic paradigm that bridges statistical mechanics and reinforcement learning. The symmetry-aware design not only advances the tractable frontiers of high-order disordered systems, but also opens a promising avenue for machine-learning-based solvers to tackle previously intractable combinatorial optimization challenges.

Optimizing p-spin models through hypergraph neural networks and deep reinforcement learning

TL;DR

Abstract

Paper Structure (9 sections, 12 equations, 8 figures)

This paper contains 9 sections, 12 equations, 8 figures.

High-order spin glasses and NP-hard optimization
PLANCK architecture and empirical performance
Conclusion and outlook

Figures (8)

Figure 1: Illustrative ground-state optimization on a hexagonal lattice. Ground‑state search on a $L=8$ hexagonal ($p=6$) lattice with fixed boundary conditions and Gaussian couplings. Each hexagon represents a six‑spin interaction (opacity scales with $|J|$; blue-shaded areas indicate unsatisfied bonds with $-J\Pi\sigma_i > 0$). Snapshots compare the steady-state configurations achieved by Greedy search, Simulated Annealing (SA), Parallel Tempering (PT), and PLANCK. As can be observed, PLANCK is the only method that reaches the exact ground state (verified by Gurobi, a branch-and-bound based exact solver), whereas other methods stall in higher‑energy local minima.
Figure 2: Structural topologies and landscape complexity of $p$-spin glasses. a, Depiction of higher-order structural topologies and their broad physical applications. Left: Representative $p$-spin models on triangular ($p=3$), square ($p=4$), and hexagonal ($p=6$) Edwards–Anderson lattices. Right: These models offer theoretical tools for probing complex systems such as structural glasses and supercooled liquids, and for tackling decoding problems in quantum topological color codes. b, Hierarchy of energy landscape topologies across $p$-spin models. For $p=2$ (SK model), the landscape reflects the tension between the full replica symmetry breaking (RSB) scenario and the droplet picture. In the intermediate range $2 < p < \infty$, the system enters a Gardner phase, creating a rugged, fractal‑like free‑energy landscape. In the limit $p \to \infty$, the landscape reduces to the uncorrelated energy levels of the Random Energy Model (REM).
Figure 3: PLANCK framework (I): training stage and learning components. During training, we generate synthetic small $p$-spin glass instances and represent each instance as a hypergraph (nodes correspond to sites and hyperedges represent multi-spin interactions), which are added to the training pool. The PLANCK agent is optimized by sampling from this pool and interacting with instances in episodes that traverse from an all-spins-up to an all-spins-down configuration. At each step, gauge transformations produce energetically identical representations; the encoder $\Theta_\mathcal{E}$ performs message passing to produce node embeddings, and the decoder $\Theta_\mathcal{D}$ estimates $Q$-values for spin flips. Experience transitions are stored in a replay buffer and mini-batches are sampled to update $\{\Theta_\mathcal{E}, \Theta_\mathcal{D}\}$ via gradient descent.
Figure 4: PLANCK framework (II): application stage and hybrid inference. In the application stage, an instance is initialized at a high-temperature phase with a random configuration $S_t$. The system probabilistically selects between a traditional energy-based metaheuristic method (EMH) and a PLANCK metaheuristic (PMH). If PMH is chosen, a gauge transformation GT($\cdot$) maps the system to an equivalent representation; the trained PLANCK performs coordinated spin flips to produce an energy trajectory, and a conditional update mechanism is applied (updating to the lowest-energy state when $\Delta E \le 0$, or to a perturbed state when $\Delta E > 0$), followed by the inverse transformation GT$^{-1}$($\cdot$) to yield the next state. If EMH is selected, a Metropolis-Hastings annealing process is executed, accepting flips with probability $\min[1, e^{-\beta_t \Delta E}]$. The global minimum-energy configuration found through this hybrid strategy is output as the predicted ground state in the low-temperature phase.
Figure 5: Performance of different methods in minimizing the $p$-spin Hamiltonian. We compared the disorder averaged "ground-state" energy per bond (predicted by each method), denoted as $e_0$, to benchmark various methods. Each result is computed over 50 independent instances, with mean values (bars) and standard error of the mean (SEM; error bars) reported. Note that PLANCK tested here is trained exclusively on small synthetic instances ($L=5$ for triangular $p=3$ and square $p=4$; $L=4$ for hexagonal $p=6$). It is then tested, without any retraining or fine‑tuning, on systems 4–6 times larger than seen during training. Both simulated annealing (SA) and parallel tempering (PT) use identical computational budget ($N_{\rm init}=5000$). It can be observed that across all lattice types and both Bimodal (a–c) and Gaussian (d–f) couplings, PLANCK (red) consistently finds the lowest‑energy configurations among all other methods. The presented results demonstrate that the policy PLANCK learned on small instances transfers effectively to much larger, never‑before‑seen instances.
...and 3 more figures

Optimizing p-spin models through hypergraph neural networks and deep reinforcement learning

TL;DR

Abstract

Optimizing p-spin models through hypergraph neural networks and deep reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)