Table of Contents
Fetching ...

F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation

Ethan Decker, Christopher Watson, Junyu Zhou, Yuhao Liu, Chenxu Liu, Ang Li, Gushu Li, Samuel Stein

TL;DR

This work presents F2, an offline reinforcement learning framework that leverages free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulations. By modeling the compilation of a single Trotter step as an RL problem over a Lie-algebra of unitary operators and employing a dual-tower transformer with compositional action embeddings, F2 achieves substantial reductions in gate count ($\approx$46-47%) and circuit depth ($\approx$36-38%) while preserving high-fidelity evolutions ($\approx 10^{-7}$). A key novelty is the trajectory-reversal data augmentation that yields abundant guaranteed-successful online data, together with a geometry-aware critic regularizer that aligns value estimates with the distance to the identity. The results across lattice models, materials, and protein fragments (12–222 qubits) demonstrate the potential of learning-guided, structure-aware quantum compilation for scalable quantum simulations, with implications for practical quantum advantage on near- to mid-term devices.

Abstract

Compiling shallow and accurate quantum circuits for Hamiltonian simulation remains challenging due to hardware constraints and the combinatorial complexity of minimizing gate count and circuit depth. Existing optimization method pipelines rely on hand-engineered classical heuristics, which cannot learn input-dependent structure and therefore miss substantial opportunities for circuit reduction. We introduce F2, an offline reinforcement learning framework that exploits free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulation circuits. F2 provides (i) a reinforcement-learning environment over classically simulatable free-fermionic subroutines, (ii) architectural and objective-level inductive biases that stabilize long-horizon value learning, and (iii) a reversible synthetic-trajectory generation mechanism that consistently yields abundant, guaranteed-successful offline data. Across benchmarks spanning lattice models, protein fragments, and crystalline materials (12-222 qubits), F2 reduces gate count by 47% and depth by 38% on average relative to strong baselines (Qiskit, Cirq/OpenFermion) while maintaining average errors of 10^(-7). These results show that aligning deep reinforcement learning with the algebraic structure of quantum dynamics enables substantial improvements in circuit synthesis, suggesting a promising direction for scalable, learning-based quantum compilation

F2: Offline Reinforcement Learning for Hamiltonian Simulation via Free-Fermionic Subroutine Compilation

TL;DR

This work presents F2, an offline reinforcement learning framework that leverages free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulations. By modeling the compilation of a single Trotter step as an RL problem over a Lie-algebra of unitary operators and employing a dual-tower transformer with compositional action embeddings, F2 achieves substantial reductions in gate count (46-47%) and circuit depth (36-38%) while preserving high-fidelity evolutions (). A key novelty is the trajectory-reversal data augmentation that yields abundant guaranteed-successful online data, together with a geometry-aware critic regularizer that aligns value estimates with the distance to the identity. The results across lattice models, materials, and protein fragments (12–222 qubits) demonstrate the potential of learning-guided, structure-aware quantum compilation for scalable quantum simulations, with implications for practical quantum advantage on near- to mid-term devices.

Abstract

Compiling shallow and accurate quantum circuits for Hamiltonian simulation remains challenging due to hardware constraints and the combinatorial complexity of minimizing gate count and circuit depth. Existing optimization method pipelines rely on hand-engineered classical heuristics, which cannot learn input-dependent structure and therefore miss substantial opportunities for circuit reduction. We introduce F2, an offline reinforcement learning framework that exploits free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulation circuits. F2 provides (i) a reinforcement-learning environment over classically simulatable free-fermionic subroutines, (ii) architectural and objective-level inductive biases that stabilize long-horizon value learning, and (iii) a reversible synthetic-trajectory generation mechanism that consistently yields abundant, guaranteed-successful offline data. Across benchmarks spanning lattice models, protein fragments, and crystalline materials (12-222 qubits), F2 reduces gate count by 47% and depth by 38% on average relative to strong baselines (Qiskit, Cirq/OpenFermion) while maintaining average errors of 10^(-7). These results show that aligning deep reinforcement learning with the algebraic structure of quantum dynamics enables substantial improvements in circuit synthesis, suggesting a promising direction for scalable, learning-based quantum compilation

Paper Structure

This paper contains 31 sections, 25 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The goal is to optimize a Trotter step. 1) The input Hamiltonian is separated into classically representable terms and exponential terms. 2) Exponential unitaries are compiled using state-of-the-art techniques, while classically simulatable unitaries are decomposed with our reinforcement-learning algorithm. 3) The policy architecture constructs a trajectory by sequentially multiplying actions into the target unitary until the terminal state is reached. 4) Actions have theoretical decompositions into circuit representations, creating a mapping between states and quantum circuits. 5) After both compilers complete circuit synthesis, the resulting circuits are concatenated to produce a deeply optimized Trotter step.
  • Figure 2: Our two tower neural network architecture using custom compositional embeddings. The compositional embeddings factorize each feature of the action into a custom embedding and then mix these features together for the final embedding
  • Figure 3: Comparing the learning efficency over one epoch on distance to identity regression. Architectures used are naive embeddings for action space vs compositional embeddings introduced in Figure \ref{['fig:architecture']}. X axis is on the order of 1000 steps
  • Figure 4: 500 random unitaries with small angle rotations compiled by F2. Fidelity is grouped by error ($1-F$).
  • Figure 5: 100 random unitaries with small angle rotations compiled by Monte Carlo pretrained F2. Geometric critic represents our regularized objective with learned weightings while critic is a model trained on the reward function shaped by geometric returns at a flat scaling. Fidelity is grouped by error ($1-F$).