Table of Contents
Fetching ...

Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design

Zhenfeng Deng, Ruijie Hou, Ningrui Xie, Mike Tyers, Michał Koziarski

TL;DR

This work tackles the challenge of designing protein binders for large, multi-domain targets by leveraging a local-first interpretation of PFNN energy functions. It introduces an epitope-only hallucination strategy that trims targets to discontinuous surface residues around binding sites, paired with Monte Carlo-based evolution and per-residue biased ProteinMPNN redesign to improve local sequence features. The integrated approach yields substantial gains in sampling speed and design success, enabling binders for previously intractable targets such as ClpP and ALS3 and delivering a practical, generalizable pipeline for large-target binder design. These results support the notion that local interactions dominate PFNN-guided design and offer a scalable framework for expanding the therapeutic scope of de novo protein binders, while highlighting the need for further validation and cross-model testing.

Abstract

Recent advances in structure-based protein design have accelerated de novo binder generation, yet interfaces on large domains or spanning multiple domains remain challenging due to high computational cost and declining success with increasing target size. We hypothesized that protein folding neural networks (PFNNs) operate in a ``local-first'' manner, prioritizing local interactions while displaying limited sensitivity to global foldability. Guided by this hypothesis, we propose an epitope-only strategy that retains only the discontinuous surface residues surrounding the binding site. Compared to intact-domain workflows, this approach improves in silico success rates by up to 80% and reduces the average time per successful design by up to forty-fold, enabling binder design against previously intractable targets such as ClpP and ALS3. Building on this foundation, we further developed a tailored pipeline that incorporates a Monte Carlo-based evolution step to overcome local minima and a position-specific biased inverse folding step to refine sequence patterns. Together, these advances not only establish a generalizable framework for efficient binder design against structurally large and otherwise inaccessible targets, but also support the broader ``local-first'' hypothesis as a guiding principle for PFNN-based design.

Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design

TL;DR

This work tackles the challenge of designing protein binders for large, multi-domain targets by leveraging a local-first interpretation of PFNN energy functions. It introduces an epitope-only hallucination strategy that trims targets to discontinuous surface residues around binding sites, paired with Monte Carlo-based evolution and per-residue biased ProteinMPNN redesign to improve local sequence features. The integrated approach yields substantial gains in sampling speed and design success, enabling binders for previously intractable targets such as ClpP and ALS3 and delivering a practical, generalizable pipeline for large-target binder design. These results support the notion that local interactions dominate PFNN-guided design and offer a scalable framework for expanding the therapeutic scope of de novo protein binders, while highlighting the need for further validation and cross-model testing.

Abstract

Recent advances in structure-based protein design have accelerated de novo binder generation, yet interfaces on large domains or spanning multiple domains remain challenging due to high computational cost and declining success with increasing target size. We hypothesized that protein folding neural networks (PFNNs) operate in a ``local-first'' manner, prioritizing local interactions while displaying limited sensitivity to global foldability. Guided by this hypothesis, we propose an epitope-only strategy that retains only the discontinuous surface residues surrounding the binding site. Compared to intact-domain workflows, this approach improves in silico success rates by up to 80% and reduces the average time per successful design by up to forty-fold, enabling binder design against previously intractable targets such as ClpP and ALS3. Building on this foundation, we further developed a tailored pipeline that incorporates a Monte Carlo-based evolution step to overcome local minima and a position-specific biased inverse folding step to refine sequence patterns. Together, these advances not only establish a generalizable framework for efficient binder design against structurally large and otherwise inaccessible targets, but also support the broader ``local-first'' hypothesis as a guiding principle for PFNN-based design.

Paper Structure

This paper contains 17 sections, 1 equation, 11 figures.

Figures (11)

  • Figure 1: Epitope-only hallucination for binder design against large targets. a. Workflow overview. Targets are cropped into discontinuous epitope for binder hallucination. Monte Carlo-based evolution is then performed to overcome local minima. Initial designs with high confidence are co-folded with intact target as validation. Eligible designs are redesigned with per-residue biased MPNN to improve developability and local features. b. Increased speed/success rate of epitope-only hallucination strategy, showcased by CP binder design against WDR5.
  • Figure 2: Efficiency improvement with epitope-only hallucination strategy. a. Correlation between Design/Refold Success. Each dot represents a batch of design with different conditions (e.g., target epitope range, evolution conditions). Lines are kernel density estimate (KDE) plot of dots distribution. b. Increased success rate in refold validation compared to full domain as input. Red dashed line indicates the performance of full domain. c. Reduced per-refold-success time. Missing bar indicates no success design, thus no available data.
  • Figure 3: Design success rate improved with MC evolution.
  • Figure 4: Optimized sequential features with biased MPNN. a. Optimized distribution of pI in TcdB MP binders. Dashed line indicates the acceptable thresholds. Inner table reported the refold success rate with extra pI constraint. b. Optimized Occupation of polar residue on PPI. Dashed line indicates the acceptable thresholds. c. Refold success rate with extra H-bond criteria.
  • Figure 5: Schema of Monte-Carlo evolution. At each step, we select 5% of binder positions with probability $\propto$ 1-pLDDT and propose substitutions either uniformly or from a position-specific scoring matrix (PSSM) derived from the hallucinated amino acid distributions. After each proposal, we refold the complex and recompute pLDDT and the amino-acid distributions to update the hallucination loss. Unlike the semi-greedy scheme, MC also accepts loss-increasing moves, enabling early exploration then exploitation; we retain the lowest-loss frame along the trajectory. Matrix colors encode quality or sampling probability; red tones indicate lower quality/probability and blue tones higher.
  • ...and 6 more figures