F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation
Ethan Decker, Christopher Watson, Junyu Zhou, Yuhao Liu, Chenxu Liu, Ang Li, Gushu Li, Samuel Stein
TL;DR
This work presents F2, an offline reinforcement learning framework that leverages free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulations. By modeling the compilation of a single Trotter step as an RL problem over a Lie-algebra of unitary operators and employing a dual-tower transformer with compositional action embeddings, F2 achieves substantial reductions in gate count ($\approx$46-47%) and circuit depth ($\approx$36-38%) while preserving high-fidelity evolutions ($\approx 10^{-7}$). A key novelty is the trajectory-reversal data augmentation that yields abundant guaranteed-successful online data, together with a geometry-aware critic regularizer that aligns value estimates with the distance to the identity. The results across lattice models, materials, and protein fragments (12–222 qubits) demonstrate the potential of learning-guided, structure-aware quantum compilation for scalable quantum simulations, with implications for practical quantum advantage on near- to mid-term devices.
Abstract
Compiling shallow and accurate quantum circuits for Hamiltonian simulation remains challenging due to hardware constraints and the combinatorial complexity of minimizing gate count and circuit depth. Existing optimization method pipelines rely on hand-engineered classical heuristics, which cannot learn input-dependent structure and therefore miss substantial opportunities for circuit reduction. We introduce F2, an offline reinforcement learning framework that exploits free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulation circuits. F2 provides (i) a reinforcement-learning environment over classically simulatable free-fermionic subroutines, (ii) architectural and objective-level inductive biases that stabilize long-horizon value learning, and (iii) a reversible synthetic-trajectory generation mechanism that consistently yields abundant, guaranteed-successful offline data. Across benchmarks spanning lattice models, protein fragments, and crystalline materials (12-222 qubits), F2 reduces gate count by 47% and depth by 38% on average relative to strong baselines (Qiskit, Cirq/OpenFermion) while maintaining average errors of 10^(-7). These results show that aligning deep reinforcement learning with the algebraic structure of quantum dynamics enables substantial improvements in circuit synthesis, suggesting a promising direction for scalable, learning-based quantum compilation
