Table of Contents
Fetching ...

NS-Pep: De novo Peptide Design with Non-Standard Amino Acids

Tao Guo, Junbo Yin, Yu Wang, Xin Gao

TL;DR

NS-Pep addresses the challenge of de novo peptide design when NSAAs are present, by introducing a unified flow-matching framework that co-generates sequence and structure conditioned on a pocket. The method integrates Residue Frequency-Guided Modification (RFGM) to mitigate long-tailed NSAA distributions, Progressive Side-chain Perception (PSP) for coarse-to-fine side-chain modeling, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues, enabling natural generalization to NSAA-containing peptides and NSAA-aware folding. Empirically, NS-Pep improves amino acid recovery and binding affinity and outperforms AlphaFold3 in pocket-specific folding, validating the approach across de novo design and folding tasks; the ablations confirm the contribution of each module. While promising, the work notes limitations in generalizing to ultra-rare NSAAs and calls for future enhancements such as few-shot learning and reinforcement learning, alongside responsible usage considerations for therapeutic peptide design.

Abstract

Peptide drugs incorporating non-standard amino acids (NSAAs) offer improved binding affinity and improved pharmacological properties. However, existing peptide design methods are limited to standard amino acids, leaving NSAA-aware design largely unexplored. We introduce NS-Pep, a unified framework for co-designing peptide sequences and structures with NSAAs. The main challenge is that NSAAs are extremely underrepresented-even the most frequent one, SEP, accounts for less than 0.4% of residues-resulting in a severe long-tailed distribution. To improve generalization to rare amino acids, we propose Residue Frequency-Guided Modification (RFGM), which mitigates over-penalization through frequency-aware logit calibration, supported by both theoretical and empirical analysis. Furthermore, we identify that insufficient side-chain modeling limits geometric representation of NSAAs. To address this, we introduce Progressive Side-chain Perception (PSP) for coarse-to-fine torsion and location prediction, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues. Moreover, NS-Pep generalizes naturally to the peptide folding task with NSAAs, addressing a major limitation of current tools. Experiments show that NS-Pep improves sequence recovery rate and binding affinity by 6.23% and 5.12%, respectively, and outperforms AlphaFold3 by 17.76% in peptide folding success rate.

NS-Pep: De novo Peptide Design with Non-Standard Amino Acids

TL;DR

NS-Pep addresses the challenge of de novo peptide design when NSAAs are present, by introducing a unified flow-matching framework that co-generates sequence and structure conditioned on a pocket. The method integrates Residue Frequency-Guided Modification (RFGM) to mitigate long-tailed NSAA distributions, Progressive Side-chain Perception (PSP) for coarse-to-fine side-chain modeling, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues, enabling natural generalization to NSAA-containing peptides and NSAA-aware folding. Empirically, NS-Pep improves amino acid recovery and binding affinity and outperforms AlphaFold3 in pocket-specific folding, validating the approach across de novo design and folding tasks; the ablations confirm the contribution of each module. While promising, the work notes limitations in generalizing to ultra-rare NSAAs and calls for future enhancements such as few-shot learning and reinforcement learning, alongside responsible usage considerations for therapeutic peptide design.

Abstract

Peptide drugs incorporating non-standard amino acids (NSAAs) offer improved binding affinity and improved pharmacological properties. However, existing peptide design methods are limited to standard amino acids, leaving NSAA-aware design largely unexplored. We introduce NS-Pep, a unified framework for co-designing peptide sequences and structures with NSAAs. The main challenge is that NSAAs are extremely underrepresented-even the most frequent one, SEP, accounts for less than 0.4% of residues-resulting in a severe long-tailed distribution. To improve generalization to rare amino acids, we propose Residue Frequency-Guided Modification (RFGM), which mitigates over-penalization through frequency-aware logit calibration, supported by both theoretical and empirical analysis. Furthermore, we identify that insufficient side-chain modeling limits geometric representation of NSAAs. To address this, we introduce Progressive Side-chain Perception (PSP) for coarse-to-fine torsion and location prediction, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues. Moreover, NS-Pep generalizes naturally to the peptide folding task with NSAAs, addressing a major limitation of current tools. Experiments show that NS-Pep improves sequence recovery rate and binding affinity by 6.23% and 5.12%, respectively, and outperforms AlphaFold3 by 17.76% in peptide folding success rate.

Paper Structure

This paper contains 30 sections, 1 theorem, 19 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

Let $k$ be any class. Suppose a non-negative perturbation $\epsilon \geq 0$ is only added to its corresponding logit, resulting in ${z}_{k}' = z_{k} + \epsilon$, while all other logits remain unchanged. Then, the gradient of the cross-entropy loss with respect to $z_k$ satisfies where $\textbf{z}'$ denotes the updated logit vector after perturbation.

Figures (9)

  • Figure 1: (a) Naively substituting NSAAs (e.g., PTR and SEP) with standard ones (e.g., TYR and SER) results in suboptimal conformations. (b) Distributions of binding affinity before (blue) and after (red) NSAA substitution. (c) Distribution of Standard and Non-standard Amino Acids.
  • Figure 2: Overview of NS-Pep. (a) The model takes a noised peptide and a protein pocket as input, and generates the peptide sequence and structure via flow matching. (b) NS-Pep simultaneously supports NSAA-inclusion peptide co-design and folding with pocket conditioning. (c) RFGM facilitates the learning of long-tailed distributions. (d) PSP for coarse-to-fine side-chain structure modeling. (e) IAW guides the peptide generation toward functionally crucial hotspot residues.
  • Figure 3: Interpretation of NS-Pep's generation for 3RM0 (PDB ID)
  • Figure 4: Visualization of experimental results. (a) Two generated peptides compared with their corresponding native ones (case: 1BMB, 3SHA). (b) Top: Folding performance across four models, stratified by peptide length. Bottom: Example showing a peptide predicted by AlphaFold3 that fails to bind the target pocket. (c) Distribution of binding affinity. NS-Pep(S) substitutes all the NSAAs with their standard counterparts among the generation of NS-Pep.
  • Figure 5: Additional peptides generated by NS-Pep
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof