Table of Contents
Fetching ...

Active learning for energy-based antibody optimization and enhanced screening

Kairi Furui, Masahito Ohue

TL;DR

This method integrates the RDE-Network deep learning model with Rosetta's energy function-based Flex ddG to efficiently explore mutants and significantly improved the screening performance over random selection and demonstrated the ability to identify mutants with better binding properties without experimental $\Delta\Delta G$ data.

Abstract

Accurate prediction and optimization of protein-protein binding affinity is crucial for therapeutic antibody development. Although machine learning-based prediction methods $ΔΔG$ are suitable for large-scale mutant screening, they struggle to predict the effects of multiple mutations for targets without existing binders. Energy function-based methods, though more accurate, are time consuming and not ideal for large-scale screening. To address this, we propose an active learning workflow that efficiently trains a deep learning model to learn energy functions for specific targets, combining the advantages of both approaches. Our method integrates the RDE-Network deep learning model with Rosetta's energy function-based Flex ddG to efficiently explore mutants. In a case study targeting HER2-binding Trastuzumab mutants, our approach significantly improved the screening performance over random selection and demonstrated the ability to identify mutants with better binding properties without experimental $ΔΔG$ data. This workflow advances computational antibody design by combining machine learning, physics-based computations, and active learning to achieve more efficient antibody development.

Active learning for energy-based antibody optimization and enhanced screening

TL;DR

This method integrates the RDE-Network deep learning model with Rosetta's energy function-based Flex ddG to efficiently explore mutants and significantly improved the screening performance over random selection and demonstrated the ability to identify mutants with better binding properties without experimental data.

Abstract

Accurate prediction and optimization of protein-protein binding affinity is crucial for therapeutic antibody development. Although machine learning-based prediction methods are suitable for large-scale mutant screening, they struggle to predict the effects of multiple mutations for targets without existing binders. Energy function-based methods, though more accurate, are time consuming and not ideal for large-scale screening. To address this, we propose an active learning workflow that efficiently trains a deep learning model to learn energy functions for specific targets, combining the advantages of both approaches. Our method integrates the RDE-Network deep learning model with Rosetta's energy function-based Flex ddG to efficiently explore mutants. In a case study targeting HER2-binding Trastuzumab mutants, our approach significantly improved the screening performance over random selection and demonstrated the ability to identify mutants with better binding properties without experimental data. This workflow advances computational antibody design by combining machine learning, physics-based computations, and active learning to achieve more efficient antibody development.
Paper Structure (4 sections, 4 figures)

This paper contains 4 sections, 4 figures.

Figures (4)

  • Figure 1: Overview of the proposed active learning workflow.
  • Figure 2: (a) Transition of the calculated top 200 Flex ddG values of the selected mutants at each active learning cycle. (b) Number of selected mutants that bound and unbound based on Flex ddG at each active learning cycle.
  • Figure 3: The H-chain sequences of the HER2 mutant dataset were embedded using AbLang olsen2022ablang and compressed to 2D using UMAP mcinnes2018umap for visualization. The red dots represent the selected mutants at each active learning cycle, and the blue dots represent the top 200 selections for that cycle. In the early cycles of active learning, the blue and red dots overlap, but in the later cycles, the red dots spread out more than the blue dots.
  • Figure 4: (a) Transition of Spearman correlation of the surrogate model for the Trastuzumab mutant dataset at each active learning cycle. (b) Transition of Spearman correlation on Flex ddG of the surrogate model for the Trastuzumab mutant dataset at each active learning cycle. Each correlation was calculated only for those labeled as binders. (c) Transition of ROC-AUC scores for binding classification for the Trastuzumab mutant dataset at each active learning cycle. (d) Transition of ROC-AUC scores for binding classification on Flex ddG for the Trastuzumab mutant dataset at each active learning cycle. The target variable was set as "binder" when $\Delta \Delta G_{\mathrm{{Flex ddG}}}$ < 0.