Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

Maxwell Kleinsasser; Brayden J. Halverson; Edward Kraft; Sean Francis-Lyon; Sarah E. Hugo; Mackenzie R. Roman; Ben Miller; Andrew D. Blevins; Ian K. Quigley

Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

Maxwell Kleinsasser, Brayden J. Halverson, Edward Kraft, Sean Francis-Lyon, Sarah E. Hugo, Mackenzie R. Roman, Ben Miller, Andrew D. Blevins, Ian K. Quigley

TL;DR

Hermes introduces a lightweight transformer trained exclusively on large-scale DEL screening data to learn transferable protein–ligand interaction representations, enabling generalization to held-out targets and unseen chemistries without traditional affinity labels. By using pre-trained embeddings (ESM-2 for proteins and ChemBERTa for ligands) and a joint cross-attention mechanism, Hermes efficiently fuses protein and ligand information and supports fast inference suitable for virtual screening. Across diverse benchmarks, Hermes generalizes to external datasets and different assay systems, though performance varies with target space and data quality, with an ensemble of checkpoints improving stability. The results highlight the value of DEL data for learning transferable PLI representations and demonstrate substantial speed advantages over structure-based models, suggesting DEL-trained models can drive scalable, early-stage drug discovery while acknowledging limitations from label noise and memorization tendencies.

Abstract

The quality and consistency of training data remain critical bottlenecks for protein-ligand binding prediction. Public affinity datasets, aggregated from thousands of labs and assay formats, introduce biases that limit model generalization and complicate evaluation. DNA-encoded chemical libraries (DELs) offer a potential solution: unified experimental protocols generating massive binding datasets across diverse chemical and protein target space. We present Hermes, a lightweight transformer trained exclusively on DEL data from screens against hundreds of protein targets, representing one of the largest and most protein-diverse DEL training sets applied to protein-ligand interaction (PLI) modeling to date. Despite never seeing traditional affinity measurements during training, Hermes generalizes to held-out targets, novel chemical scaffolds, and external benchmarks derived from public binding data and high-throughput screens. Our results demonstrate that DEL data alone captures transferable protein-ligand interaction representations, while Hermes' minimal architecture enables inference speeds suitable for large-scale virtual screening.

Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

TL;DR

Abstract

Paper Structure (33 sections, 8 figures, 5 tables)

This paper contains 33 sections, 8 figures, 5 tables.

Introduction
Methods
Hermes
Training Data
Evaluation
Benchmark Models
Boltz-2.
XGBoost Baseline.
Benchmark Evaluation Data.
Results
Benchmark Comparisons
Boltz-2.
XGBoost Baseline.
Inference Speed.
Checkpoint and Training Details
...and 18 more sections

Figures (8)

Figure 1: Hermes architecture diagram.
Figure 2: DEL screening workflow and training data composition. (A) Schematic of the DEL screening assay: (i) protein expression, purification, and immobilization; (ii) library incubation with iterative washing to remove non-binders; (iii) PCR amplification and sequencing of retained compounds; (iv) quality control, enrichment quantification, and hit classification. (B) Distribution of training samples across protein targets, colored by protein family.
Figure 3: Hermes per-protein AUROC scores by evaluation dataset. Point size indicates the number of positive samples for each protein target. Results are stratified by protein family (kinase vs. non-kinase) to assess whether the kinase-enriched training set composition translates to differential evaluation performance. The dashed line indicates random classifier performance (AUROC = 0.5). Solid horizontal lines indicate mean AUROC within each dataset and protein family group.
Figure 4: Hermes vs. benchmarks per-protein AUROC comparison across evaluation datasets. Point size indicates number of binders within the protein target. All but the Public Binders/Decoys dataset are subsampled to 50k samples for Boltz-2 inference time/cost feasibility.
Figure 5: Chemical similarity between training and validation sets. Kernel density estimates of ECFP4 Tanimoto similarity between binders in each validation set and a representative training set. (a) Mean pairwise Tanimoto similarity: for each of 1,000 randomly sampled validation binders, the mean similarity to 10,000 randomly sampled training binders. (b) Maximum nearest-neighbor Tanimoto similarity: for each validation binder, the highest similarity to any hit molecule in the training set. Fingerprints were computed as 2048-bit Morgan fingerprints with radius 2. Note that the $x$-axis scales differ between panels to accommodate the distinct ranges of each metric.
...and 3 more figures

Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

TL;DR

Abstract

Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)