HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

Gian Marco Visani; William Galvin; Zac Jones; Michael N. Pun; Eric Daniel; Kevin Borisiak; Utheri Wagura; Armita Nourmohammad

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

Gian Marco Visani, William Galvin, Zac Jones, Michael N. Pun, Eric Daniel, Kevin Borisiak, Utheri Wagura, Armita Nourmohammad

TL;DR

HERMES addresses the challenge of predicting mutational effects on protein stability and binding by leveraging fast, local, structure-based models that operate on 3D atomic neighborhoods with SO(3)-equivariant networks. It pre-trains on masked amino-acid identity within a 10 Å neighborhood and fine-tunes end-to-end on stability or binding data, offering three protocols (fixed, relaxed, amortized) that trade off speed and packing awareness. Across thermodynamic stability benchmarks and antigen-design tasks, HERMES demonstrates competitive or superior performance relative to state-of-the-art methods, while enabling rapid, structure-guided screening for stabilizing mutations. A key finding is that explicit packing relaxation improves accuracy but introduces computational costs, which the amortized approach mitigates, though a wild-type bias from pre-training persists and warrants further debiasing and data diversification.

Abstract

Predicting the stability and fitness effects of amino acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed to measure mutational effects, providing us with extensive datasets across a diverse range of proteins. By training on these data, traditional computational modeling and more recent machine learning approaches have advanced significantly in predicting mutational effects. Here, we introduce HERMES, a 3D rotationally equivariant structure-based neural network model for mutational effect and stability prediction. Pre-trained to predict amino acid propensity from its surrounding 3D structure, HERMES can be fine-tuned for mutational effects using our open-source code. We present a suite of HERMES models, pre-trained with different strategies, and fine-tuned to predict the stability effect of mutations. Benchmarking against other models shows that HERMES often outperforms or matches their performance in predicting mutational effect on stability, binding, and fitness. HERMES offers versatile tools for evaluating mutational effects and can be fine-tuned for specific predictive objectives.

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

TL;DR

Abstract

Paper Structure (20 sections, 7 equations, 28 figures, 7 tables)

This paper contains 20 sections, 7 equations, 28 figures, 7 tables.

Introduction
Model
Results
Predicting mutational effects on thermodynamic fold-stability
Antigen stabilization with HERMES for vaccine design
Predicting binding effect of mutations
Discussion
Methods
HERMES architecture
Pre-processing of protein structure data
HERMES pre-training
Rosetta relaxation of mutant structures for the HERMES- relaxed protocol
Training HERMES-amortized to implicitly learn about relaxation
HERMES fine-tuning for downstream tasks
Fine-tuning datasets
...and 5 more sections

Figures (28)

Figure 1: Overview of HERMES.(A) Model architecture: HERMES takes as input an all-atom structural neighborhood (10 Å radius) around a masked focal residue. Each atom is represented by its 3D coordinates, element type, partial charge, and solvent-accessible surface area (SASA). The neighborhood is projected onto a Zernike Fourier basis (spherical hologram) and processed by rotation (SO(3)) equivariant layers to produce a rotation-invariant embedding, which is mapped to 20 amino-acid-specific logits (Methods).(B) Pre-training: models are trained to predict the identity of the masked focal residue from its surrounding atomic neighborhood; logits in (A) are converted to amino-acid propensities (probabilities) via a softmax. (C) Fine-tuning for mutational effects: model weights are optimized to regress the logit difference between mutant and wild-type amino acids to the corresponding experimental mutational effect. (D) Inference protocols. HERMES- fixed scores a substitution as the difference between the the mutant and wild-type amino-acids logits from a single forward pass, conditioned on the masked wild-type neighborhood $X_{\text{wt}}$. HERMES- relaxed conditions the mutant term on an approximate mutant neighborhood $\hat{X}_{{\text{wt}}\to{\text{mt}}}$ generated in-silico by introducing the mutation on the wild-type structure and locally relaxing the structure with Rosetta chaudhury_pyrosetta_2010. HERMES- amortized distills the relaxed protocol by fine-tuning on HERMES- relaxed predictions, enabling fast fixed-style inference while retaining relaxation-aware behavior.
Figure 2: Predicting mutational effects on thermodynamic folding stability. Stabilizing-versus-destabilizing classification metrics are computed using $\Delta\Delta G < 0$ (experimental) and $\Delta \log p > 0$ (predicted) as cutoffs for stabilizing mutations. (A) T2837 results: zero-shot models (top) and models fine-tuned or only trained on cDNA117k (middle and bottom). (B) Megascale test set results: zero-shot models (top) and models fine-tuned or only trained on the Megascale training set (middle and bottom). Model names indicate the architecture, the coordinate-noise amplitude used, and when applicable, the fine-tuning dataset (listed after "$+$"); Untr. is short for Untrained, indicating models that had no pre-training and were instead only trained on stability effects. Only models trained with coordinate noise are shown; the noise amplitude is indicated within each model name as standard deviation in Å units. Results for models trained without noise are provided in Fig. \ref{['fig:radial_plots__no_noise']}.
Figure 3: Impact of amino acid size changes on predicting mutational effects on protein stability. Amino acids are grouped into three size classes based on van der Waals volume (as listed), and predictions are stratified by wild-type and mutant size classes; "similar sizes" denote substitutions within the same class. Stabilizing-versus-destabilizing classification metrics (rows) are then computed and shown for each stratum (columns), using $\Delta\Delta G < 0$ (experimental) and $\Delta \log p > 0$ (predicted) as cutoffs for stabilizing mutations. P-values for pairwise model comparisons are shown in Fig. \ref{['fig:pairwise_pvalues_permutation_bucketed_by_sizereduced_precision_recall_f1']}, and p-values comparing each model's performance on small$\to$large vs. large$\to$small substitutions are shown in Fig. \ref{['fig:pvalues__bucketed_by_size__between_small_large_buckets__vertical']}. Model names indicate the architecture, the coordinate-noise amplitude used during pre-training as standard deviation in Å units, and when applicable, the fine-tuning dataset (listed after "$+$").
Figure 4: Uncovering mutation preferences via model-averaged substitution matrices.(A) Heatmaps of model-averaged substitution matrices $M^{\text{model}}$ computed by averaging over the Megascale test set, shown alongside BLOSUM62 and and the experimental matrix of mean $|\Delta\Delta G|$ values across mutations. Spearman correlations between matrices are reported in Fig. \ref{['fig:spearmanr_between_model_average_substitution_matrices']}. Core- and surface-restricted matrices (core: SASA $< 1 \AA^2$; surface: SASA $> 3 \AA^2$) are shown in Fig. \ref{['fig:average_prediction_matrices__abs_symm__part_1']} and \ref{['fig:average_prediction_matrices__abs_symm__part_2']}. (B) For each model and site subset (all, surface, core), boxplots summarize Spearman correlations between $M^{\text{model}}$ and property-difference matrices $M^{\text{prop}}$, grouped by property class (color). P-values from two-tailed t-tests comparing the surface vs. core correlations within each property class are shown on the right. P-values for between-model comparisons of property-class correlations are shown in Fig. \ref{['fig:correlation_to_aa_properties__significance']}.
Figure 5: Identifying model biases with structure-conditioned reversibility(A) Spearman correlations between experimental stability changes ($\Delta\Delta G$ and model predictions on the Ssym dataset are shown. For each model, we report correlations for forward substitutions ($\Delta \log p_{fwd}$ vs. $\Delta\Delta G$) and the reverse substitutions ($\Delta \log p_{rev}$ vs. $-\Delta\Delta G$). (B) Ssym proteins are stratified by their maximum sequence identity to pre-training proteins ($\geq 70\%$ vs. $< 70\%$). For each split (columns) and four representative HERMES variants (rows), the left panel shows the scatter plots for $\Delta \log p_{fwd}$ vs. $\Delta \log p_{rev}$; an unbiased model should exhibit strong anti-correlation, summarized by the reversibility score (rev.; higher is more reversible; Eq. \ref{['eq:rev_score']}). Reversibility is highest for the model not pre-trained on wild-type amino-acid classification (green; bottom row) and is consistently higher for the low-similarity subset. Fig. \ref{['fig:ssym_antisymmetry_reversibility_score_results']} shows the reversibility scores across all models in (A). In each column, the right panel shows the distributions of the log-probabilities that make up $\Delta \log\,p_{fwd} = \log p ({\text{mt}} | X_{\text{wt}}) - \log p ({\text{wt}}|X_{\text{wt}})$ (solid lines) and $\Delta \log\,p_{rev} = \log p ({\text{wt}} | X_{\text{mt}}) - \log p ({\text{mt}}|X_{\text{wt}})$ (dashed lines). All models except for the one that was not pre-trained on wild-type amino-acid classification (last row) exhibit elevated $\log p({\text{wt}}\,|X_{{\text{wt}}})$.
...and 23 more figures

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

TL;DR

Abstract

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (28)