Table of Contents
Fetching ...

Non-Canonical Crosslinks Confound Evolutionary Protein Structure Models

Romain Lacombe

TL;DR

The study addresses the challenge of predicting non-canonical crosslinks in proteins when evolutionary priors are weak or unavailable. It introduces a zero-shot benchmark based on sactipeptides, focusing on sulfur-to-α-carbon crosslinks, and evaluates six modern structure-prediction models. The results show limited performance (average GDT-TS ≈ $11.5\%$ on known crosslinks and RMSD ≈ $12.1$ Å), with minor gains for unknown sequences, and reveal a bias toward disulfide-like linkages rather than true sactibonds, underscoring gaps in handling rare PTMs. The work argues for physics-informed models and PTM-aware training to generalize to RiPPs and other modifications beyond well-represented evolutionary data, guiding future directions in biomolecular structure prediction.

Abstract

Evolution-based protein structure prediction models have achieved breakthrough success in recent years. However, they struggle to generalize beyond evolutionary priors and on sequences lacking rich homologous data. Here we present a novel, out-of-domain benchmark based on sactipeptides, a rare class of ribosomally synthesized and post-translationally modified peptides (RiPPs) characterized by sulfur-to-$α$-carbon thioether bridges creating cross-links between cysteine residues and backbone. We evaluate recent models on predicting conformations compatible with these cross-links bridges for the 10 known sactipeptides with elucidated post-translational modifications. Crucially, the structures of 5 of them have not yet been experimentally resolved. This makes the task a challenging problem for evolution-based models, which we find exhibit limited performance (0.0% to 19.2% GDT-TS on sulfur-to-$α$-carbon distance). Our results point at the need for physics-informed models to sustain progress in biomolecular structure prediction.

Non-Canonical Crosslinks Confound Evolutionary Protein Structure Models

TL;DR

The study addresses the challenge of predicting non-canonical crosslinks in proteins when evolutionary priors are weak or unavailable. It introduces a zero-shot benchmark based on sactipeptides, focusing on sulfur-to-α-carbon crosslinks, and evaluates six modern structure-prediction models. The results show limited performance (average GDT-TS ≈ on known crosslinks and RMSD ≈ Å), with minor gains for unknown sequences, and reveal a bias toward disulfide-like linkages rather than true sactibonds, underscoring gaps in handling rare PTMs. The work argues for physics-informed models and PTM-aware training to generalize to RiPPs and other modifications beyond well-represented evolutionary data, guiding future directions in biomolecular structure prediction.

Abstract

Evolution-based protein structure prediction models have achieved breakthrough success in recent years. However, they struggle to generalize beyond evolutionary priors and on sequences lacking rich homologous data. Here we present a novel, out-of-domain benchmark based on sactipeptides, a rare class of ribosomally synthesized and post-translationally modified peptides (RiPPs) characterized by sulfur-to--carbon thioether bridges creating cross-links between cysteine residues and backbone. We evaluate recent models on predicting conformations compatible with these cross-links bridges for the 10 known sactipeptides with elucidated post-translational modifications. Crucially, the structures of 5 of them have not yet been experimentally resolved. This makes the task a challenging problem for evolution-based models, which we find exhibit limited performance (0.0% to 19.2% GDT-TS on sulfur-to--carbon distance). Our results point at the need for physics-informed models to sustain progress in biomolecular structure prediction.

Paper Structure

This paper contains 14 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Example of sactipeptide (huazacin) and its post-translationally modified sulfur-to-alpha-carbon thioether bonds (blue nested hairpins). Figure from Huazacin.
  • Figure 2: Experimental results. We report metrics for proteins with an experimentally determined 3D structure ('known'), and out-of-domain sequences without a known structure ('unknown').
  • Figure 3: Ruminoccocin C1 structure predicted by AlphaFold 3: (a) super-imposed with the experimentally determined structure (left); (b) highlighting erroneously predicted disulfide bonds (right).