Table of Contents
Fetching ...

Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling

Henrik Christiansen, Federico Errica, Francesco Alesiani

TL;DR

The paper tackles the challenge of tuning Hamiltonian Monte Carlo parameters (timestep and trajectory length) for efficient sampling. It introduces a fully differentiable simulation framework that uses a local proxy loss to guide gradient-based optimization of Δt and n, including learning a distribution over the number of steps and enabling atom-dependent timesteps. Demonstrations on the harmonic oscillator and alanine dipeptide show substantial gains, including over 100x speed-up in parameter optimization and roughly 25% reductions in autocorrelation with per-atom timesteps, without extra computational cost. The work highlights timestep jittering as essential to avoid rugged loss landscapes and points to promising extensions to more complex systems and learned integrators.

Abstract

The performance of Hamiltonian Monte Carlo simulations crucially depends on both the integration timestep and the number of integration steps. We present an adaptive general-purpose framework to automatically tune such parameters, based on a local loss function which promotes the fast exploration of phase-space. We show that a good correspondence between loss and autocorrelation time can be established, allowing for gradient-based optimization using a fully-differentiable set-up. The loss is constructed in such a way that it also allows for gradient-driven learning of a distribution over the number of integration steps. Our approach is demonstrated for the one-dimensional harmonic oscillator and alanine dipeptide, a small protein common as a test case for simulation methods. Through the application to the harmonic oscillator, we highlight the importance of not using a fixed timestep to avoid a rugged loss surface with many local minima, otherwise trapping the optimization. In the case of alanine dipeptide, by tuning the only free parameter of our loss definition, we find a good correspondence between it and the autocorrelation times, resulting in a $>100$ fold speed up in optimization of simulation parameters compared to a grid-search. For this system, we also extend the integrator to allow for atom-dependent timesteps, providing a further reduction of $25\%$ in autocorrelation times.

Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling

TL;DR

The paper tackles the challenge of tuning Hamiltonian Monte Carlo parameters (timestep and trajectory length) for efficient sampling. It introduces a fully differentiable simulation framework that uses a local proxy loss to guide gradient-based optimization of Δt and n, including learning a distribution over the number of steps and enabling atom-dependent timesteps. Demonstrations on the harmonic oscillator and alanine dipeptide show substantial gains, including over 100x speed-up in parameter optimization and roughly 25% reductions in autocorrelation with per-atom timesteps, without extra computational cost. The work highlights timestep jittering as essential to avoid rugged loss landscapes and points to promising extensions to more complex systems and learned integrators.

Abstract

The performance of Hamiltonian Monte Carlo simulations crucially depends on both the integration timestep and the number of integration steps. We present an adaptive general-purpose framework to automatically tune such parameters, based on a local loss function which promotes the fast exploration of phase-space. We show that a good correspondence between loss and autocorrelation time can be established, allowing for gradient-based optimization using a fully-differentiable set-up. The loss is constructed in such a way that it also allows for gradient-driven learning of a distribution over the number of integration steps. Our approach is demonstrated for the one-dimensional harmonic oscillator and alanine dipeptide, a small protein common as a test case for simulation methods. Through the application to the harmonic oscillator, we highlight the importance of not using a fixed timestep to avoid a rugged loss surface with many local minima, otherwise trapping the optimization. In the case of alanine dipeptide, by tuning the only free parameter of our loss definition, we find a good correspondence between it and the autocorrelation times, resulting in a fold speed up in optimization of simulation parameters compared to a grid-search. For this system, we also extend the integrator to allow for atom-dependent timesteps, providing a further reduction of in autocorrelation times.
Paper Structure (25 sections, 17 equations, 9 figures, 1 table)

This paper contains 25 sections, 17 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: (a) Example trajectories for $x$ and $v$ of the one-dimensional harmonic oscillator for $\Delta t=0.1$ at $T=0.5$ as obtained from HMC with $n=100$. (b) Influence of the choice of $\Delta t$ on the simulated (shadow) Hamiltonian using otherwise the same parameters as in (a). The solid lines in the same color as the data points correspond to the analytically expected trajectories. The big dots symbolize the starting point of the trajectory, which sets the expected energy level.
  • Figure 2: (a) Acceptance $p$ and (b) logarithm of squared jump $(x'_n-x)^2$ as a function of timestep $\Delta t$ and number of integration steps $n$ for the one-dimensional harmonic oscillator at $T=0.5$. Shown are in both cases the results for the not jittered ($s=0$) and jittered ($s=1/4$) timestep $\Delta t$.
  • Figure 3: (a) and (b) show the loss $L_N$ as a function of $\Delta t$ and $n$ for (a) no jittering ($s=0$) and (b) with jittering ($s=1/4$). In (c), we plot the logarithm of the autocorrelation time extracted from the time-series of the potential energy. The region of desired small aucorrelation times corresponds reasonable well to the region where the loss is minimized, as shown in (b). (d) shows the same data as in (b), but the loss is rescaled with the computational effort $L_n/n$. Finally, in (e) we again show the logarithm of the autocorrelation time, but in units of the computational effort $n\tau$. The logarithm is chosen for the autocorrelation time to highlight the differences, as these are much larger than in the other plots for the losses.
  • Figure 4: (a) Loss surface as a function of $\Delta t$ and $n$. On top, three example trajectories show the expectations values of $\Delta t$ and $n$ during learning for three initial values of the timestep $\Delta_0 t$. (b) The corresponding loss as a function of epochs $t$ for the curves shown in (a). (c) Attention weights $c_n$ for $\Delta_0 t=0.1$ for different epochs indicated in the legend.
  • Figure 5: Graphical representation of alanine dipeptide, where the atoms are marked by their index. The white color symbolizes hydrogens, the green color stands for carbon, the red color represents oxygen and blue is for nitrogen.
  • ...and 4 more figures