Table of Contents
Fetching ...

SurfPro: Functional Protein Design Based on Continuous Surface

Zhenqiao Song, Tinglin Huang, Lei Li, Wengong Jin

TL;DR

SurfPro presents a joint geometry–biochemistry surface-to-sequence design framework that generates functional proteins by reasoning over a continuous protein surface. It introduces a hierarchical encoder (local, then global FAMHA) to capture surface shape and biochemical labels, paired with an autoregressive Transformer to predict amino acid sequences. On CATH 4.2 inverse folding and two functional design tasks (binder and enzyme design), SurfPro achieves state-of-the-art recovery and favorable binding/interaction scores, with pretraining on PDB surfaces further boosting performance. The approach enables end-to-end functional protein design from surface cues, offering a scalable pathway for rapid protein engineering and discovery.

Abstract

How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.

SurfPro: Functional Protein Design Based on Continuous Surface

TL;DR

SurfPro presents a joint geometry–biochemistry surface-to-sequence design framework that generates functional proteins by reasoning over a continuous protein surface. It introduces a hierarchical encoder (local, then global FAMHA) to capture surface shape and biochemical labels, paired with an autoregressive Transformer to predict amino acid sequences. On CATH 4.2 inverse folding and two functional design tasks (binder and enzyme design), SurfPro achieves state-of-the-art recovery and favorable binding/interaction scores, with pretraining on PDB surfaces further boosting performance. The approach enables end-to-end functional protein design from surface cues, offering a scalable pathway for rapid protein engineering and discovery.

Abstract

How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.
Paper Structure (32 sections, 9 equations, 6 figures, 15 tables)

This paper contains 32 sections, 9 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Problem setups of protein design. (a) Inverse folding: protein design conditioned on geometric constraints only. (b) Surface based design: protein design conditioned on both geometric shape and biochemical properties.
  • Figure 2: (a) The overview of our proposed SurfPro. (b) Left: local perspective modeling, right: global landscape modeling.
  • Figure 3: (a) Novelty (1-recovery rate) of designed enzymes with temperature=0.1. (b) Recovery rates on CATH 4.2 dataset for models with varying numbers of sampled surface vertices.
  • Figure 4: Amino acid distribution at different positions in designed binders for the target protein InsulinR.
  • Figure 5: Case study of complexes involving our SurfPro designed binders (in red) and target proteins (in purple): (a) target protein TrkA with pAE_interaction=4.75, (b) target protein PDGFR with pAE_interaction=5.58.
  • ...and 1 more figures