Table of Contents
Fetching ...

Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design

Shivanshu Dwivedi, Kalum Palandage

TL;DR

It is shown that an autonomous RL agent, benchmarked across five 3D lattices from tetrahedron to FCC, achieves human-competitive accuracy and outperforms other black-box optimization methods.

Abstract

Engineering electron correlations in quantum dot arrays demands navigation of high-dimensional, non-convex parameter spaces where hole doping fundamentally alters the physics. We present a comparative study of two control paradigms for the one-hole, half-filled Hubbard model: (i) systematic physics-guided design and (ii) autonomous deep reinforcement learning with geometry-aware neural architectures. While systematic analysis reveals key design principles, such as field-induced localization for trapping the mobile hole, it becomes computationally intractable for optimization. We show that an autonomous RL agent, benchmarked across five 3D lattices from tetrahedron to FCC, achieves human-competitive accuracy (R^2 > 0.97) and 95.5 percent success on held-out tasks. The agent is 3-4 orders of magnitude more sample-efficient than grid search and outperforms other black-box optimization methods. Transfer learning yields 91 percent few-shot generalization to unseen geometries. This work establishes autonomous RL as a viable and highly efficient framework for rapid optimization and non-obvious strategy discovery in complex quantum systems.

Comparative Analysis of Autonomous and Systematic Control Strategies for Hole-Doped Hubbard Clusters: Reinforcement Learning versus Physics-Guided Design

TL;DR

It is shown that an autonomous RL agent, benchmarked across five 3D lattices from tetrahedron to FCC, achieves human-competitive accuracy and outperforms other black-box optimization methods.

Abstract

Engineering electron correlations in quantum dot arrays demands navigation of high-dimensional, non-convex parameter spaces where hole doping fundamentally alters the physics. We present a comparative study of two control paradigms for the one-hole, half-filled Hubbard model: (i) systematic physics-guided design and (ii) autonomous deep reinforcement learning with geometry-aware neural architectures. While systematic analysis reveals key design principles, such as field-induced localization for trapping the mobile hole, it becomes computationally intractable for optimization. We show that an autonomous RL agent, benchmarked across five 3D lattices from tetrahedron to FCC, achieves human-competitive accuracy (R^2 > 0.97) and 95.5 percent success on held-out tasks. The agent is 3-4 orders of magnitude more sample-efficient than grid search and outperforms other black-box optimization methods. Transfer learning yields 91 percent few-shot generalization to unseen geometries. This work establishes autonomous RL as a viable and highly efficient framework for rapid optimization and non-obvious strategy discovery in complex quantum systems.

Paper Structure

This paper contains 1 section, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: RL agent achieves human-competitive accuracy. Scatter plots comparing the predicted optimal double occupancy from the trained RL agent ($D_{\mathrm{RL}}$) versus the ground truth calculated value ($D_{\mathrm{calc}}$) for 200 held-out tasks. The dashed diagonal line represents perfect agreement ($y=x$). (a) Results for the Tetrahedron geometry ($Z=3$), which exhibits the highest predictive accuracy ($R^2=0.995$). (b) The Octahedron geometry shows strong agreement ($R^2=0.977$). (c) The Simple Cubic (SC) lattice achieves $R^2=0.991$. (d) The Body-Centered Cubic (BCC) lattice shows $R^2=0.980$. (e) The Face-Centered Cubic (FCC) lattice ($Z=12$) maintains high fidelity ($R^2=0.973$) even in the high-coordination regime. The high correlation across all panels confirms the agent generalizes well across distinct topological connectivities.
  • Figure 2: Superhuman sample efficiency. Computational cost comparison shows the RL agent (red) finds optimal solutions after training on only $\sim 2 \times 10^3$ simulation calls, whereas a standard grid search (blue) would require $\sim 5 \times 10^6$ simulations to map the same parameter space. This represents a $\sim 2500 \times$ speedup, making intractable optimization problems tractable.
  • Figure 3: RL agent's intelligent exploration. Normalized best solution found (cumulative reward) versus the number of simulation evaluations (log scale). The RL agent (red) demonstrates a steep logistic learning curve, reflecting intelligent, targeted exploration. It significantly outperforms other black-box optimization methods: Bayesian Optimization (purple), a Genetic Algorithm (green), and Random Search (orange).