Reinforcement learning for path integrals in quantum statistical physics

Timour Ichmoukhamedov; Dries Sels

Reinforcement learning for path integrals in quantum statistical physics

Timour Ichmoukhamedov, Dries Sels

TL;DR

The paper addresses the challenge of computing thermal properties of quantum systems via Euclidean path integrals by reframing path-integral sampling as an optimal-control problem and applying reinforcement learning to learn a drift that guides path sampling. The authors propose a two-step framework: a variational RL step that yields an upper bound on the propagator and free energy, followed by a direct-sampling step that can converge to exact results, enabling efficient estimation of the thermal density matrix and the partition function. They demonstrate the approach on a quantum rotor chain, achieving accurate per-site free energy and showing extrapolation of a network trained on $N=9$ to $N=15$ without retraining, along with accelerated convergence for correlation functions. Limitations include neglecting bosonic/fermionic permutations and dependence on a simple importance-sampling prior, with future work pointing to adjoint-method scaling and extensions to polaron physics.

Abstract

Machine learning is rapidly finding its way into the field of computational quantum physics. One of the most popular and widely studied approaches in this direction is to use neural networks to model quantum states (NQS) in the Hamiltonian formulation of quantum mechanics. However, an alternative angle of attack to leverage machine learning in physics is through the path integral formulation, which has so far received far more limited attention. In this paper, we explore how reinforcement learning can be used to compute a class of Euclidean path integrals that yield the thermal density matrix of a quantum system, thereby enabling the computation of the free energy or other thermal expectation values. In particular, we propose a two-step approach with the unique feature that after a variational approximation for a quantity is obtained in a first step, it can then be used to efficiently compute the exact result in a second step. We benchmark this method on several simple systems and then apply it to the quantum rotor chain.

Reinforcement learning for path integrals in quantum statistical physics

TL;DR

without retraining, along with accelerated convergence for correlation functions. Limitations include neglecting bosonic/fermionic permutations and dependence on a simple importance-sampling prior, with future work pointing to adjoint-method scaling and extensions to polaron physics.

Abstract

Paper Structure (10 sections, 16 equations, 8 figures)

This paper contains 10 sections, 16 equations, 8 figures.

INTRODUCTION
Methodology
Propagator
Optimizing the control function with RL
Free energy
Results
Quantum rotor chain
Extrapolation sampling
Correlation function
Discussion

Figures (8)

Figure 1: Summary of the central results of this paper. Left panel: we use a shared bidirectional LSTM architecture to train a path integral sampler for the quantum rotor chain (simplified diagram). We show how after training on a fixed number of particles the network can then be readily used on larger systems without having to train again. Right panel (at $J=1$, $\beta=5$) shows several independent direct sampling runs of the free energy with an LSTM trained on $N=9$ but then used for $N=15$. When benchmarked against a simpler bridge control strategy in the absence of a neural net, we observe major improvements in path sampling convergence.
Figure 2: Benchmarking results for the diagonal of the propagator $K(\mathbf{x},\beta|\mathbf{x},0)$ for the anharmonic oscillator $\hat{H}=p^2/2 + x^2/2 + \lambda x^4$ (left at $\lambda=5$) and the hydrogen atom $\hat{H}=\mathbf{p}^2/2 - \frac{1}{|\mathbf{x}|}$ (right) at different temperatures (top row $\beta=1/2$, bottom row $\beta=5$). The inset shows that the variational step (magenta diamonds) indeed provides an upper bound, while the result from the second direct sampling step (black squares) is in agreement with exact diagonalization (blue line).
Figure 3: Visualization of the optimized control function $u_\theta$ (including the bridge term) for the anharmonic oscillator at $\lambda=5$, $\beta=5$ and for $x_0=x_T=1$.
Figure 4: The free energy of the anharmonic oscillator $\hat{H}=p^2/2 + x^2/2 + \lambda x^4$ as a function of the coupling strength $\lambda$, at different temperatures $\beta$. Variational results from Eq. \ref{['eq:frenergy_inequality']} (magenta diamonds) are compared with direct sampling Eq. \ref{['eq:dirsample_partition']} with the trained $u_\theta(...|\mathbf{z})$, and benchmarked against exact diagonalization (blue line).
Figure 5: The free energy per particle of the rotor chain at different temperatures $\beta$ and system sizes $N$. For $N=3$ an exact diagonalization benchmarking result is computed independently (thick blue line) which is in excellent agreement with both the variational (magenta triangles) and direct (black squares) sampling approaches. For $N=9$ we can no longer obtain the exact diagonalization result but still observe excellent agreement between the variational and direct sampling results.
...and 3 more figures

Reinforcement learning for path integrals in quantum statistical physics

TL;DR

Abstract

Reinforcement learning for path integrals in quantum statistical physics

Authors

TL;DR

Abstract

Table of Contents

Figures (8)