Table of Contents
Fetching ...

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

Matteo Rossi, Ryan Pederson, Miles Wang-Henderson, Ben Kaufman, Edward C. Williams, Carl Underkoffler, Owen Lewis Howell, Adrian Layer, Stephan Thaler, Narbe Mardirossian, John Anthony Parkhill

TL;DR

TerraBind tackles the bottleneck of structure-based drug discovery by replacing expensive all-atom diffusion with a diffusion-free, coarse pocket-level representation learned by a lightweight pairformer. It combines frozen encoders (COATI-3 for ligands and ESM-2 for proteins) with a compact 48-layer pairformer, a coarse-grained pose module, and an affinity module that includes an epinet for calibrated uncertainty, achieving $26\times$ higher throughput and about $\sim$20\% better binding affinity correlations on public and proprietary benchmarks. Key contributions include a multi-stage training curriculum, built-in structural uncertainty via distogram entropy $H_{LP}$, and continual-learning enabled DMTA optimization using the EMAX acquisition function. The approach delivers practically deployable high-throughput predictions with reliable uncertainty estimates, enabling billion-scale screening on industrially relevant data while retaining competitive pose accuracy and superior affinity predictions.

Abstract

We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by $\sim$20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C$_β$ atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by $\sim$20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6$\times$ greater affinity improvement of selected molecules over greedy-based approaches.

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

TL;DR

TerraBind tackles the bottleneck of structure-based drug discovery by replacing expensive all-atom diffusion with a diffusion-free, coarse pocket-level representation learned by a lightweight pairformer. It combines frozen encoders (COATI-3 for ligands and ESM-2 for proteins) with a compact 48-layer pairformer, a coarse-grained pose module, and an affinity module that includes an epinet for calibrated uncertainty, achieving higher throughput and about 20\% better binding affinity correlations on public and proprietary benchmarks. Key contributions include a multi-stage training curriculum, built-in structural uncertainty via distogram entropy , and continual-learning enabled DMTA optimization using the EMAX acquisition function. The approach delivers practically deployable high-throughput predictions with reliable uncertainty estimates, enabling billion-scale screening on industrially relevant data while retaining competitive pose accuracy and superior affinity predictions.

Abstract

We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by 20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by 20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6 greater affinity improvement of selected molecules over greedy-based approaches.
Paper Structure (49 sections, 18 equations, 16 figures, 1 table, 2 algorithms)

This paper contains 49 sections, 18 equations, 16 figures, 1 table, 2 algorithms.

Figures (16)

  • Figure 1: TerraBind coarse structure prediction. Example predicted protein-ligand complexes for PDB entries 8JJT, 8B1C, 8W9I, and 8RQH (left to right). The purple particles show the predicted binding site residues within 15 Å of the ligand, while the gray particles show the experimental ground truth structure. Proteins are represented by C$_\beta$ atoms only (no side chains), and ligands are shown with all heavy atoms. TerraBind predicts the local binding site structure rather than full protein co-folding.
  • Figure 2: TerraBind performance overview. (a) End-to-end inference time per complex on a single A6000 GPU (196 tokens, 10 samples), demonstrating a $26\times$ speedup over Boltz-2. (b) Binding affinity prediction (Pearson correlation) on CASP16 and proprietary assay data, showing up to 20% improvement.
  • Figure 3: TerraBind performance overview (cont.). (c) Structure prediction performance compared to Boltz-1\ref{['fn:cutoff']}, aggregated across FoldBench, PoseBusters, and Runs N' Poses benchmarks. Left bars show ligand RMSD $<2$Å success rate; right bars show a stricter metric (RMSD $<2$Å and LDDT-PLI $>0.8$) that captures binding-relevant geometry. Models: Boltz-1 (full diffusion pipeline), Boltz-1 Trunk (Boltz-1 trunk representation with our coordinate optimization), TerraBind Pocket (196-token pocket context), and TerraBind (full protein context). See Section \ref{['sec:structure_benchmarks']} for detailed model descriptions.
  • Figure 4: TerraBind Architecture. The model consists of four main components: (1) Frozen pretrained encoders: COATI-3 (E(3)-equivariant encoder and SMILES transformer for ligands) and ESM-2 (language model for proteins) provide initial representations without requiring MSA generation at inference time. (2) Structure module: A 48-layer pairformer architecture with triangle attention and multiplication learns pocket-level structural representations by predicting categorical distributions over distance bins for all token pairs. The pairwise distance entropy provides built-in structural uncertainty quantification. (3) Pose module: For coordinate generation when needed, simple coarse-grained optimization from pairformer distance logits produces 3D structures in $<$0.2 second per complex without requiring diffusion. (4) Affinity module: A specialized 6-layer pairformer operates on frozen structural features to predict binding likelihood (binary classification) and affinity values (continuous regression), with an epistemic neural network (epinet) providing calibrated uncertainty quantification.
  • Figure 5: Structure prediction performance across benchmarks. Each panel shows ligand RMSD $<2\text{\AA}$ success rate (left bars) and combined success rate requiring both RMSD $<2\text{\AA}$ and LDDT-PLI $>0.8$ (right bars). (a) FoldBench, a low-homology benchmark.
  • ...and 11 more figures