Table of Contents
Fetching ...

Full-Atom Peptide Design based on Multi-modal Flow Matching

Jiahan Li, Chaoran Cheng, Zuofan Wu, Ruihan Guo, Shitong Luo, Zhizhou Ren, Jian Peng, Jianzhu Ma

TL;DR

PepFlow is presented, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors that offer substantial potential in drug discovery.

Abstract

Peptides, short chains of amino acid residues, play a vital role in numerous biological processes by interacting with other target molecules, offering substantial potential in drug discovery. In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. Drawing inspiration from the crucial roles of residue backbone orientations and side-chain dynamics in protein-peptide interactions, we characterize the peptide structure using rigid backbone frames within the $\mathrm{SE}(3)$ manifold and side-chain angles on high-dimensional tori. Furthermore, we represent discrete residue types in the peptide sequence as categorical distributions on the probability simplex. By learning the joint distributions of each modality using derived flows and vector fields on corresponding manifolds, our method excels in the fine-grained design of full-atom peptides. Harnessing the multi-modal paradigm, our approach adeptly tackles various tasks such as fix-backbone sequence design and side-chain packing through partial sampling. Through meticulously crafted experiments, we demonstrate that PepFlow exhibits superior performance in comprehensive benchmarks, highlighting its significant potential in computational peptide design and analysis.

Full-Atom Peptide Design based on Multi-modal Flow Matching

TL;DR

PepFlow is presented, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors that offer substantial potential in drug discovery.

Abstract

Peptides, short chains of amino acid residues, play a vital role in numerous biological processes by interacting with other target molecules, offering substantial potential in drug discovery. In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. Drawing inspiration from the crucial roles of residue backbone orientations and side-chain dynamics in protein-peptide interactions, we characterize the peptide structure using rigid backbone frames within the manifold and side-chain angles on high-dimensional tori. Furthermore, we represent discrete residue types in the peptide sequence as categorical distributions on the probability simplex. By learning the joint distributions of each modality using derived flows and vector fields on corresponding manifolds, our method excels in the fine-grained design of full-atom peptides. Harnessing the multi-modal paradigm, our approach adeptly tackles various tasks such as fix-backbone sequence design and side-chain packing through partial sampling. Through meticulously crafted experiments, we demonstrate that PepFlow exhibits superior performance in comprehensive benchmarks, highlighting its significant potential in computational peptide design and analysis.
Paper Structure (75 sections, 35 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 75 sections, 35 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Left. A peptide binds to its target protein receptor, highlighting the pivotal role of backbone orientations and side-chain interactions among key residues. Right. Every protein residue consists of backbone atoms and side-chain atoms. The backbone atoms establish a rigid frame, whereas the side-chain atoms contribute to flexible side-chain angles.
  • Figure 2: Illustration of PepFlow Architecture. The encoder encodes the receptor as the context for peptide generation. Flows for four different modalities are then constructed: spherical for the orientation $R$, Euclidean for the translation $\mathbf{x}$, toric for the torsion angles $\chi_k$, and categorical for the type distribution $p$. The multi-modal flow matching decoder finally recovers the full-atom peptide structure and sequence iteratively using the Euler method.
  • Figure 3: Left: RMSD of designed peptides of different lengths. (Short: 3-9, Medium: 10-14, Long: 15-25) Middle: Ramachandran plot of PepFlow generated and native peptides. Right: Binding energy distributions of generated and native peptides. (lower is better)
  • Figure 4: Three examples of the generated peptides. Top: native peptides; Bottom: generated peptides. PDB: 3MXY, 6OX4, 5DJY.
  • Figure 5: Length distribution of the peptide in our dataset.
  • ...and 5 more figures