Reinforcement learning for path integrals in quantum statistical physics
Timour Ichmoukhamedov, Dries Sels
TL;DR
The paper addresses the challenge of computing thermal properties of quantum systems via Euclidean path integrals by reframing path-integral sampling as an optimal-control problem and applying reinforcement learning to learn a drift that guides path sampling. The authors propose a two-step framework: a variational RL step that yields an upper bound on the propagator and free energy, followed by a direct-sampling step that can converge to exact results, enabling efficient estimation of the thermal density matrix and the partition function. They demonstrate the approach on a quantum rotor chain, achieving accurate per-site free energy and showing extrapolation of a network trained on $N=9$ to $N=15$ without retraining, along with accelerated convergence for correlation functions. Limitations include neglecting bosonic/fermionic permutations and dependence on a simple importance-sampling prior, with future work pointing to adjoint-method scaling and extensions to polaron physics.
Abstract
Machine learning is rapidly finding its way into the field of computational quantum physics. One of the most popular and widely studied approaches in this direction is to use neural networks to model quantum states (NQS) in the Hamiltonian formulation of quantum mechanics. However, an alternative angle of attack to leverage machine learning in physics is through the path integral formulation, which has so far received far more limited attention. In this paper, we explore how reinforcement learning can be used to compute a class of Euclidean path integrals that yield the thermal density matrix of a quantum system, thereby enabling the computation of the free energy or other thermal expectation values. In particular, we propose a two-step approach with the unique feature that after a variational approximation for a quantity is obtained in a first step, it can then be used to efficiently compute the exact result in a second step. We benchmark this method on several simple systems and then apply it to the quantum rotor chain.
