Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning

Giovanny Espitia; Yui Tik Pang; James C. Gumbart

Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning

Giovanny Espitia, Yui Tik Pang, James C. Gumbart

TL;DR

The study addresses protein structure prediction in the 3D HP lattice model, reframing folding as energy minimization via hydrophobic contacts with $E = -\big(\text{number of valid H-H contacts}\big)$. It introduces two DRL architectures—a reservoir-based hybrid (FFNN-R) and an LSTM with multi-head attention (LSTM-A)—trained under a stabilized Deep Q-Learning framework. For short sequences, FFNN-R delivers faster convergence with ~25% fewer episodes, while for longer sequences, LSTM-A captures long-range dependencies and achieves best-known values, albeit with higher compute and memory demands. The results highlight complementary strengths: efficient local pattern learning by FFNN-R and robust long-range modeling by LSTM-A, suggesting fruitful directions for hybrid designs and scalable protein-folding strategies in lattice models.

Abstract

We address protein structure prediction in the 3D Hydrophobic-Polar lattice model through two novel deep learning architectures. For proteins under 36 residues, our hybrid reservoir-based model combines fixed random projections with trainable deep layers, achieving optimal conformations with 25% fewer training episodes. For longer sequences, we employ a long short-term memory network with multi-headed attention, matching best-known energy values. Both architectures leverage a stabilized Deep Q-Learning framework with experience replay and target networks, demonstrating consistent achievement of optimal conformations while significantly improving training efficiency compared to existing methods.

Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning

TL;DR

The study addresses protein structure prediction in the 3D HP lattice model, reframing folding as energy minimization via hydrophobic contacts with

. It introduces two DRL architectures—a reservoir-based hybrid (FFNN-R) and an LSTM with multi-head attention (LSTM-A)—trained under a stabilized Deep Q-Learning framework. For short sequences, FFNN-R delivers faster convergence with ~25% fewer episodes, while for longer sequences, LSTM-A captures long-range dependencies and achieves best-known values, albeit with higher compute and memory demands. The results highlight complementary strengths: efficient local pattern learning by FFNN-R and robust long-range modeling by LSTM-A, suggesting fruitful directions for hybrid designs and scalable protein-folding strategies in lattice models.

Abstract

Paper Structure (17 sections, 6 equations, 11 figures, 4 tables)

This paper contains 17 sections, 6 equations, 11 figures, 4 tables.

Introduction and Related Work
Methodology
Modeling the problem in a cubic lattice
DRL Setup
Markov Decision Process Formulation
Deep Q-Learning with Stabilization Techniques
State Representation
Q - Network Architectures
Hybrid - Reservoir
LSTM with Multi-Head-Attention
Experiments and Results
Efficiency
Discussion
Limitations and Future Directions
Conclusion
...and 2 more sections

Figures (11)

Figure 1: Deep reinforcement learning training loop. In a), we sample a batch of experience from the buffer. The batch then serves as input to the Q - network in b). Based on the output Q - value tensor, the agent makes a decision in c) to take $a_{t+1}$ that corresponds to the greatest value of the Q - output tensor. In step d), the experience is stored in the replay memory..
Figure 2: The input layer consists of a (N, 8, 1) tensor representing the state at a particular timestep. The reservoir is a randomly initialized weight matrix with a topology specified beforehand. The linear layers consists of a simple fully-connected feed forward neural network. The output is a (5, 1) tensor representing the Q - value or future expected total reward per action.
Figure 3: LSTM-A architecture for protein folding. Sequential states are processed through LSTM cells, generating hidden states that are weighted by an 8-head attention mechanism. The attention output is mapped to action Q-values through a fully connected layer, enabling the model to leverage both sequential patterns and long-range dependencies.
Figure 4: Least Energy Conformations for different sequences.
Figure 5: Plots a) 3d1 and b) 3d5 show the minimum conformation energy as a function of episode.
...and 6 more figures

Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning

TL;DR

Abstract

Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)