Table of Contents
Fetching ...

Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions

Xiaoran Jiao, Weian Mao, Wengong Jin, Peiyuan Yang, Hao Chen, Chunhua Shen

TL;DR

The Boltzmann Alignment technique is proposed to transfer knowledge from pre-trained inverse folding models to $\Delta \Delta G$ prediction, introducing a physical inductive bias and achieving both supervised and unsupervised state-of-the-art (SoTA) performance.

Abstract

Predicting the change in binding free energy ($ΔΔG$) is crucial for understanding and modulating protein-protein interactions, which are critical in drug design. Due to the scarcity of experimental $ΔΔG$ data, existing methods focus on pre-training, while neglecting the importance of alignment. In this work, we propose the Boltzmann Alignment technique to transfer knowledge from pre-trained inverse folding models to $ΔΔG$ prediction. We begin by analyzing the thermodynamic definition of $ΔΔG$ and introducing the Boltzmann distribution to connect energy with protein conformational distribution. However, the protein conformational distribution is intractable; therefore, we employ Bayes' theorem to circumvent direct estimation and instead utilize the log-likelihood provided by protein inverse folding models for $ΔΔG$ estimation. Compared to previous inverse folding-based methods, our method explicitly accounts for the unbound state of protein complex in the $ΔΔG$ thermodynamic cycle, introducing a physical inductive bias and achieving both supervised and unsupervised state-of-the-art (SoTA) performance. Experimental results on SKEMPI v2 indicate that our method achieves Spearman coefficients of 0.3201 (unsupervised) and 0.5134 (supervised), significantly surpassing the previously reported SoTA values of 0.2632 and 0.4324, respectively. Futhermore, we demonstrate the capability of our method on binding energy prediction, protein-protein docking and antibody optimization tasks.

Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions

TL;DR

The Boltzmann Alignment technique is proposed to transfer knowledge from pre-trained inverse folding models to prediction, introducing a physical inductive bias and achieving both supervised and unsupervised state-of-the-art (SoTA) performance.

Abstract

Predicting the change in binding free energy () is crucial for understanding and modulating protein-protein interactions, which are critical in drug design. Due to the scarcity of experimental data, existing methods focus on pre-training, while neglecting the importance of alignment. In this work, we propose the Boltzmann Alignment technique to transfer knowledge from pre-trained inverse folding models to prediction. We begin by analyzing the thermodynamic definition of and introducing the Boltzmann distribution to connect energy with protein conformational distribution. However, the protein conformational distribution is intractable; therefore, we employ Bayes' theorem to circumvent direct estimation and instead utilize the log-likelihood provided by protein inverse folding models for estimation. Compared to previous inverse folding-based methods, our method explicitly accounts for the unbound state of protein complex in the thermodynamic cycle, introducing a physical inductive bias and achieving both supervised and unsupervised state-of-the-art (SoTA) performance. Experimental results on SKEMPI v2 indicate that our method achieves Spearman coefficients of 0.3201 (unsupervised) and 0.5134 (supervised), significantly surpassing the previously reported SoTA values of 0.2632 and 0.4324, respectively. Futhermore, we demonstrate the capability of our method on binding energy prediction, protein-protein docking and antibody optimization tasks.

Paper Structure

This paper contains 26 sections, 13 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the Boltzmann Alignment technique. Left: inference with a protein inverse folding model. Right: illustration of thermodynamic cycle in the modulation of protein-protein interactions.
  • Figure 2: Comparison of correlations between experimental $\Delta\Delta G$ and $\Delta\Delta G$ predicted by three representative methods.
  • Figure 3: Distributions of per-structure Pearson correlation scores and Spearman correlation scores for six representative methods.
  • Figure 4: Left: The Spearman correlation between predicted and true binding free energy ($\Delta G$) on the SAbDab test set. Right: Performance of selection methods for rigid protein-protein docking as the number of generative samples increases. "Perfect Selection" shows the best possible performance with an ideal selection method. "Diffdock-PP" refers to the confidence model proposed by Diffdock-PP ketata2023diffdock. "BA-DDG" involves using our method to estimate $\Delta G$ for selection. Performance is assessed by calculating the fraction of 13 antibody complex generation tasks on the DIPS test set NEURIPS2019_6c7de1f2/DIPS that achieve a C-RMSD of less than 5Å.