Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach

Chandrajit Bajaj; Minh Nguyen; Conrad Li

Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach

Chandrajit Bajaj, Minh Nguyen, Conrad Li

TL;DR

A novel reinforcement learning framework designed to optimize molecular dynamics by focusing on the entire trajectory rather than just the final molecular configuration, which makes it suitable for applications in areas such as drug discovery and molecular design.

Abstract

In this paper, we present a novel reinforcement learning framework designed to optimize molecular dynamics by focusing on the entire trajectory rather than just the final molecular configuration. Leveraging a stochastic version of Pontryagin's Maximum Principle (PMP) and Soft Actor-Critic (SAC) algorithm, our framework effectively explores non-convex molecular energy landscapes, escaping local minima to stabilize in low-energy states. Our approach operates in continuous state and action spaces without relying on labeled data, making it applicable to a wide range of molecular systems. Through extensive experimentation on six distinct molecules, including Bradykinin and Oxytocin, we demonstrate competitive performance against other unsupervised physics-based methods, such as the Greedy and NEMO-based algorithms. Our method's adaptability and focus on dynamic trajectory optimization make it suitable for applications in areas such as drug discovery and molecular design.

Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach

TL;DR

Abstract

Paper Structure (11 sections, 1 theorem, 19 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 1 theorem, 19 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Problem statement and reinforcement learning formulation
Problem statement
Reinforcement learning formulation
Soft actor-critic
Reward function via stochastic Pontryagin Maximum Principle
Soft Actor-Critic Algorithm Implementation
Experiments
Conclusions
Acknowledgments.
Proof of \ref{['lem:dynamic_form']}

Key Result

lemma thmcounterlemma

Let $\Gamma(t)$ be a time-dependent function. For a reward function of the form $r(s, a) = \Gamma(t)(\pm a^2/2 - V(s))$, the resulting optimal dynamics follow the trajectory described by: where $W_t$ is the white noise, and $K$ is a constant related to the covariance $\sigma$.

Figures (7)

Figure 1: Snapshots from a learned episode for Bradykinin molecule
Figure 2: Snapshots from a learned episode for CLN025 molecule
Figure 3: Snapshots from a learned episode for Met-enkephalin molecule
Figure 4: Energy evolution over time across 100 learned episodes, illustrating the system's stabilization towards low-energy configurations. The $x$-axis represents time steps, while the $y$-axis shows the potential energy values. Red curves are bad learned episodes with high end-energy and large variance.
Figure 5: Snapshots from a learned episode for Oxytocin molecule
...and 2 more figures

Theorems & Definitions (2)

lemma thmcounterlemma
remark thmcounterremark

Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach

TL;DR

Abstract

Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)