Table of Contents
Fetching ...

PRIMRose: Insights into the Per-Residue Energy Metrics of Proteins with Double InDel Mutations using Deep Learning

Stella Brown, Nicolas Preisig, Autumn Davis, Brian Hutchinson, Filip Jagodzinski

TL;DR

PRIMRose introduces a per-residue deep learning approach to predict Rosetta energy changes caused by double InDel mutations in proteins. By using a fully convolutional CNN with residual connections, it delivers residue-level energy predictions for 14 Rosetta scores across nine proteins, including unseen mutation contexts, with high accuracy on local energy terms. The work demonstrates strong generalization to novel insertion positions and amino-acid pairs, highlights the importance of protein structure context, and shows substantial computational savings over traditional Rosetta simulations. These results offer rapid, interpretable mutational insights and scalability toward larger mutational landscapes.

Abstract

Understanding how protein mutations affect protein structure is essential for advancements in computational biology and bioinformatics. We introduce PRIMRose, a novel approach that predicts energy values for each residue given a mutated protein sequence. Unlike previous models that assess global energy shifts, our method analyzes the localized energetic impact of double amino acid insertions or deletions (InDels) at the individual residue level, enabling residue-specific insights into structural and functional disruption. We implement a Convolutional Neural Network architecture to predict the energy changes of each residue in a protein mutation. We train our model on datasets constructed from nine proteins, grouped into three categories: one set with exhaustive double InDel mutations, another with approximately 145k randomly sampled double InDel mutations, and a third with approximately 80k randomly sampled double InDel mutations. Our model achieves high predictive accuracy across a range of energy metrics as calculated by the Rosetta molecular modeling suite and reveals localized patterns that influence model performance, such as solvent accessibility and secondary structure context. This per-residue analysis offers new insights into the mutational tolerance of specific regions within proteins and provides higher interpretable and biologically meaningful predictions of InDels' effects.

PRIMRose: Insights into the Per-Residue Energy Metrics of Proteins with Double InDel Mutations using Deep Learning

TL;DR

PRIMRose introduces a per-residue deep learning approach to predict Rosetta energy changes caused by double InDel mutations in proteins. By using a fully convolutional CNN with residual connections, it delivers residue-level energy predictions for 14 Rosetta scores across nine proteins, including unseen mutation contexts, with high accuracy on local energy terms. The work demonstrates strong generalization to novel insertion positions and amino-acid pairs, highlights the importance of protein structure context, and shows substantial computational savings over traditional Rosetta simulations. These results offer rapid, interpretable mutational insights and scalability toward larger mutational landscapes.

Abstract

Understanding how protein mutations affect protein structure is essential for advancements in computational biology and bioinformatics. We introduce PRIMRose, a novel approach that predicts energy values for each residue given a mutated protein sequence. Unlike previous models that assess global energy shifts, our method analyzes the localized energetic impact of double amino acid insertions or deletions (InDels) at the individual residue level, enabling residue-specific insights into structural and functional disruption. We implement a Convolutional Neural Network architecture to predict the energy changes of each residue in a protein mutation. We train our model on datasets constructed from nine proteins, grouped into three categories: one set with exhaustive double InDel mutations, another with approximately 145k randomly sampled double InDel mutations, and a third with approximately 80k randomly sampled double InDel mutations. Our model achieves high predictive accuracy across a range of energy metrics as calculated by the Rosetta molecular modeling suite and reveals localized patterns that influence model performance, such as solvent accessibility and secondary structure context. This per-residue analysis offers new insights into the mutational tolerance of specific regions within proteins and provides higher interpretable and biologically meaningful predictions of InDels' effects.

Paper Structure

This paper contains 13 sections, 3 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: One-to-one plots showing our performance for 1c44 on $\text{Test}_{\text{Rand}}$. The x-axis denotes the true values and the y-axis denotes the predictions made by our PRIMRose model.
  • Figure 2: Plots showing how our performance varies based on the locations of the insertions. The x-axis denotes the set of all mutants including the given insertion position and the y-axis denotes the average root mean square error over that set. The line style shows the secondary structure at the insertion position.