Table of Contents
Fetching ...

Deep Reinforcement Learning for Modelling Protein Complexes

Ziqi Gao, Tao Feng, Jiaxuan You, Chenyi Zi, Yan Zhou, Chen Zhang, Jia Li

TL;DR

This work tackles the challenge of modelling large protein complexes (PCM) by reframing assembly as a sequential decision process on an acyclic undirected connected graph, where the search space grows as $N^{N-2}$. The authors introduce GAPN, a Generative Adversarial Policy Network that uses policy gradient optimization (PPO) with domain-specific rewards and a novel adversarial reward to capture global assembly rules across scales. The method leverages ESM-based chain embeddings and attention-driven action prediction, while a Graph Convolutional Network discriminator informs the adversarial reward to improve cross-scale generalization. Empirical results show GAPN achieves state-of-the-art accuracy (TM-Score and RMSD) and substantial speed-ups (about $600\times$) over baselines, handling up to around 60 chains efficiently and demonstrating robust ablation support for the adversarial component.

Abstract

AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modelling, PCM). However, there are still two challenges: 1) The huge combinatorial optimization space of $N^{N-2}$ ($N$ is the number of chains) for the PCM problem can easily lead to high computational cost. 2) The scales of protein complexes exhibit distribution shift due to variance in chain numbers, which calls for the generalization in modelling complexes of various scales. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PCM prediction. Specifically, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we design an adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of complexes and the global assembly rules learned from complexes with varied chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading PCM softwares.

Deep Reinforcement Learning for Modelling Protein Complexes

TL;DR

This work tackles the challenge of modelling large protein complexes (PCM) by reframing assembly as a sequential decision process on an acyclic undirected connected graph, where the search space grows as . The authors introduce GAPN, a Generative Adversarial Policy Network that uses policy gradient optimization (PPO) with domain-specific rewards and a novel adversarial reward to capture global assembly rules across scales. The method leverages ESM-based chain embeddings and attention-driven action prediction, while a Graph Convolutional Network discriminator informs the adversarial reward to improve cross-scale generalization. Empirical results show GAPN achieves state-of-the-art accuracy (TM-Score and RMSD) and substantial speed-ups (about ) over baselines, handling up to around 60 chains efficiently and demonstrating robust ablation support for the adversarial component.

Abstract

AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modelling, PCM). However, there are still two challenges: 1) The huge combinatorial optimization space of ( is the number of chains) for the PCM problem can easily lead to high computational cost. 2) The scales of protein complexes exhibit distribution shift due to variance in chain numbers, which calls for the generalization in modelling complexes of various scales. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PCM prediction. Specifically, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we design an adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of complexes and the global assembly rules learned from complexes with varied chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading PCM softwares.
Paper Structure (33 sections, 10 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 10 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: The assembly process applied for the PCM problem. Each incoming chain has the opportunity to dock onto any already docked chain.
  • Figure 2: Overview of the proposed framework. It contains a Generative Adversarial Policy Network (GAPN) (b), which converts the input state $s_t$ into assembly action $a_t$ for the multi-chain assembly process (d). The GAPN is trained with the help of value network (a), which estimates the values come from two aspects: domain-specific rewards obtained through multi-chain assembly process (d) and adversarial rewards from adversarial reward function (c).
  • Figure 3: PCM performance analysis.(a). TM-Score and (b). RMSD distributions of all baselines on multimers of $3\leq N \leq 10$. (c). TM-Score distribution of MoLPC and GAPN on large-scale multimers of $11\leq N \leq 30$, with the median TM-Score marked by ✕.
  • Figure 4: (a). The RMSD performance of GAPN and the one without Adversarial reward (AR). The bar chart represents the relative difference between the two for multimers of each specific scale. (b). Training curve comparison between our GAPN model and the one without AR. Both models are trained on multimers of $3\leq N \leq 30$ with GT dimer structures.
  • Figure 5: Knowledge gap analysis.
  • ...and 1 more figures