Table of Contents
Fetching ...

Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding

Zhiqiang Zhong, Davide Mottin

TL;DR

This work proposes EvolMPNN, Evolution-aware Message Passing Neural Network, an efficient model to learn evolution-aware protein embeddings that shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup in comparison with large pre-trained models.

Abstract

Predicting protein properties is paramount for biological and medical advancements. Current protein engineering mutates on a typical protein, called the wild-type, to construct a family of homologous proteins and study their properties. Yet, existing methods easily neglect subtle mutations, failing to capture the effect on the protein properties. To this end, we propose EvolMPNN, Evolution-aware Message Passing Neural Network, an efficient model to learn evolution-aware protein embeddings. EvolMPNN samples sets of anchor proteins, computes evolutionary information by means of residues and employs a differentiable evolution-aware aggregation scheme over these sampled anchors. This way, EvolMPNN can efficiently utilise a novel message-passing method to capture the mutation effect on proteins with respect to the anchor proteins. Afterwards, the aggregated evolution-aware embeddings are integrated with sequence embeddings to generate final comprehensive protein embeddings. Our model shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup in comparison with large pre-trained models. Code and models are available at https://github.com/zhiqiangzhongddu/EvolMPNN.

Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding

TL;DR

This work proposes EvolMPNN, Evolution-aware Message Passing Neural Network, an efficient model to learn evolution-aware protein embeddings that shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup in comparison with large pre-trained models.

Abstract

Predicting protein properties is paramount for biological and medical advancements. Current protein engineering mutates on a typical protein, called the wild-type, to construct a family of homologous proteins and study their properties. Yet, existing methods easily neglect subtle mutations, failing to capture the effect on the protein properties. To this end, we propose EvolMPNN, Evolution-aware Message Passing Neural Network, an efficient model to learn evolution-aware protein embeddings. EvolMPNN samples sets of anchor proteins, computes evolutionary information by means of residues and employs a differentiable evolution-aware aggregation scheme over these sampled anchors. This way, EvolMPNN can efficiently utilise a novel message-passing method to capture the mutation effect on proteins with respect to the anchor proteins. Afterwards, the aggregated evolution-aware embeddings are integrated with sequence embeddings to generate final comprehensive protein embeddings. Our model shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup in comparison with large pre-trained models. Code and models are available at https://github.com/zhiqiangzhongddu/EvolMPNN.
Paper Structure (15 sections, 2 theorems, 8 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 2 theorems, 8 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

theorem thmcountertheorem

Given any finite metric space $(\mathcal{M}, \mathrm{F}_{\textsc{Dist}})$, with $\mid \mathcal{M} \mid = M$, there exists an embedding of $(\mathcal{M}, \mathrm{F}_{\textsc{Dist}})$ into $\mathbb{R}^k$ under any $l_p$ metric, where $k = O(\log^2 M)$, and the distortion of the embedding is $O(\log M)

Figures (4)

  • Figure 1: Protein property prediction on homologous protein family. (a) An example homologous protein family with labelled nearby mutants with few mutations. We aim to predict the label of unknown mutants with more mutations. (b) The evolutionary pattern for (a). For instance, $\mathbf{Y}_0$ is the label vector of the corresponding protein sequence, and $(p_1, m_1)$ indicates mutation $m_1$ at position $p_1$ of the protein's amino acid sequence.
  • Figure 2: Our EvolMPNN framework encodes protein mutations via a sapient combination of residue evolution and sequence encoding.
  • Figure 3: Performance on protein groups of different numbers of mutations, with the Low-vs-High split and avg. epoch inference time on GB1 dataset.
  • Figure 4: EvolMPNN performance on AAV's Low-vs-High (a) and 2-vs-Rest (b) splits, with different hyper-parameters.

Theorems & Definitions (3)

  • definition thmcounterdefinition: Distortion
  • theorem thmcountertheorem: Bourgain Theorem
  • theorem thmcountertheorem: Constructive Proof of Bourgain Theorem