TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

Matteo Rossi; Ryan Pederson; Miles Wang-Henderson; Ben Kaufman; Edward C. Williams; Carl Underkoffler; Owen Lewis Howell; Adrian Layer; Stephan Thaler; Narbe Mardirossian; John Anthony Parkhill

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

Matteo Rossi, Ryan Pederson, Miles Wang-Henderson, Ben Kaufman, Edward C. Williams, Carl Underkoffler, Owen Lewis Howell, Adrian Layer, Stephan Thaler, Narbe Mardirossian, John Anthony Parkhill

TL;DR

TerraBind tackles the bottleneck of structure-based drug discovery by replacing expensive all-atom diffusion with a diffusion-free, coarse pocket-level representation learned by a lightweight pairformer. It combines frozen encoders (COATI-3 for ligands and ESM-2 for proteins) with a compact 48-layer pairformer, a coarse-grained pose module, and an affinity module that includes an epinet for calibrated uncertainty, achieving $26\times$ higher throughput and about $\sim$20\% better binding affinity correlations on public and proprietary benchmarks. Key contributions include a multi-stage training curriculum, built-in structural uncertainty via distogram entropy $H_{LP}$, and continual-learning enabled DMTA optimization using the EMAX acquisition function. The approach delivers practically deployable high-throughput predictions with reliable uncertainty estimates, enabling billion-scale screening on industrially relevant data while retaining competitive pose accuracy and superior affinity predictions.

Abstract

We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by $\sim$20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C$_β$ atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by $\sim$20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6$\times$ greater affinity improvement of selected molecules over greedy-based approaches.

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

TL;DR

higher throughput and about

20\% better binding affinity correlations on public and proprietary benchmarks. Key contributions include a multi-stage training curriculum, built-in structural uncertainty via distogram entropy

, and continual-learning enabled DMTA optimization using the EMAX acquisition function. The approach delivers practically deployable high-throughput predictions with reliable uncertainty estimates, enabling billion-scale screening on industrially relevant data while retaining competitive pose accuracy and superior affinity predictions.

Abstract

20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C

atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by

20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6

greater affinity improvement of selected molecules over greedy-based approaches.

Paper Structure (49 sections, 18 equations, 16 figures, 1 table, 2 algorithms)

This paper contains 49 sections, 18 equations, 16 figures, 1 table, 2 algorithms.

Introduction
Methods
Frozen Pretrained Encoders
Structure Prediction Module
Pairformer Trunk
Structure Module Training Data
Structure Module Training Protocol
Structure Prediction Benchmarks
Coarse-grained Pose Module
Binding Affinity Prediction Module
Affinity Module Architecture
Affinity Likelihood Module
Binding Affinity Training Data
Affinity Module Training Procedure
Affinity Likelihood Module Training Procedure
...and 34 more sections

Figures (16)

Figure 1: TerraBind coarse structure prediction. Example predicted protein-ligand complexes for PDB entries 8JJT, 8B1C, 8W9I, and 8RQH (left to right). The purple particles show the predicted binding site residues within 15 Å of the ligand, while the gray particles show the experimental ground truth structure. Proteins are represented by C$_\beta$ atoms only (no side chains), and ligands are shown with all heavy atoms. TerraBind predicts the local binding site structure rather than full protein co-folding.
Figure 2: TerraBind performance overview. (a) End-to-end inference time per complex on a single A6000 GPU (196 tokens, 10 samples), demonstrating a $26\times$ speedup over Boltz-2. (b) Binding affinity prediction (Pearson correlation) on CASP16 and proprietary assay data, showing up to 20% improvement.
Figure 3: TerraBind performance overview (cont.). (c) Structure prediction performance compared to Boltz-1\ref{['fn:cutoff']}, aggregated across FoldBench, PoseBusters, and Runs N' Poses benchmarks. Left bars show ligand RMSD $<2$Å success rate; right bars show a stricter metric (RMSD $<2$Å and LDDT-PLI $>0.8$) that captures binding-relevant geometry. Models: Boltz-1 (full diffusion pipeline), Boltz-1 Trunk (Boltz-1 trunk representation with our coordinate optimization), TerraBind Pocket (196-token pocket context), and TerraBind (full protein context). See Section \ref{['sec:structure_benchmarks']} for detailed model descriptions.
Figure 4: TerraBind Architecture. The model consists of four main components: (1) Frozen pretrained encoders: COATI-3 (E(3)-equivariant encoder and SMILES transformer for ligands) and ESM-2 (language model for proteins) provide initial representations without requiring MSA generation at inference time. (2) Structure module: A 48-layer pairformer architecture with triangle attention and multiplication learns pocket-level structural representations by predicting categorical distributions over distance bins for all token pairs. The pairwise distance entropy provides built-in structural uncertainty quantification. (3) Pose module: For coordinate generation when needed, simple coarse-grained optimization from pairformer distance logits produces 3D structures in $<$0.2 second per complex without requiring diffusion. (4) Affinity module: A specialized 6-layer pairformer operates on frozen structural features to predict binding likelihood (binary classification) and affinity values (continuous regression), with an epistemic neural network (epinet) providing calibrated uncertainty quantification.
Figure 5: Structure prediction performance across benchmarks. Each panel shows ligand RMSD $<2\text{\AA}$ success rate (left bars) and combined success rate requiring both RMSD $<2\text{\AA}$ and LDDT-PLI $>0.8$ (right bars). (a) FoldBench, a low-homology benchmark.
...and 11 more figures

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

TL;DR

Abstract

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations

Authors

TL;DR

Abstract

Table of Contents

Figures (16)