Table of Contents
Fetching ...

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

Hyun Park, Parth Patel, Roland Haas, E. A. Huerta

TL;DR

APACE is a framework that couples AlphaFold2 with advanced supercomputing and Ray-based parallelism to drastically accelerate protein structure predictions while enabling conformational diversity. By optimizing both CPU and GPU stages and implementing robust data management on Delta and Polaris HPC systems, APACE reduces time-to-solution from days to minutes and scales to hundreds of ensembles. The approach preserves AlphaFold2 accuracy, extends beyond its default diversity by enabling dropout, multiple models, and ColabFold-like parameter tuning, and demonstrates practical applicability for automated, large-scale discovery workflows. This work enables seamless integration with robotic laboratories and complex biophysical investigations, accelerating discovery in biophysics and drug design.

Abstract

The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta and Polaris supercomputers, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 300 ensembles, distributed across 200 NVIDIA A100 GPUs, we found that APACE is up to two orders of magnitude faster than off-the-self AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery.

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

TL;DR

APACE is a framework that couples AlphaFold2 with advanced supercomputing and Ray-based parallelism to drastically accelerate protein structure predictions while enabling conformational diversity. By optimizing both CPU and GPU stages and implementing robust data management on Delta and Polaris HPC systems, APACE reduces time-to-solution from days to minutes and scales to hundreds of ensembles. The approach preserves AlphaFold2 accuracy, extends beyond its default diversity by enabling dropout, multiple models, and ColabFold-like parameter tuning, and demonstrates practical applicability for automated, large-scale discovery workflows. This work enables seamless integration with robotic laboratories and complex biophysical investigations, accelerating discovery in biophysics and drug design.

Abstract

The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta and Polaris supercomputers, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 300 ensembles, distributed across 200 NVIDIA A100 GPUs, we found that APACE is up to two orders of magnitude faster than off-the-self AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery.
Paper Structure (15 sections, 4 figures, 2 tables)

This paper contains 15 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Protein structure used to test APACE: serotonin transporter (PDB accession: 6AWO; shorthand SERT). Left panel is 100 SERT predicted conformational ensemble overlayed, which has good agreement with ground truth SERT. Right panel is high variant transmembrane domains (TMs), shown in cyan, and computed with root mean square fluctuations (RMSFs) overlayed. Figures are generated with Visual Molecular Dynamics (VMD) humphrey1996vmd.
  • Figure 2: Protein structure used to test APACE: the antibody-antigen complex Plasmodium vivax Duffy-binding protein (PDB accession: 6OAN). The structure has good agreement with ground truth bound structure conformation. The predicted conformational ensemble of complementary determining region (CDR; loops) of the antibody (red) binding against helical secondary structure epitopes of antibody (blue) are predicted well when compared to ground truth.
  • Figure 3: Protein structure used to test APACE: a phosphoinositide 3-kinase (PI3K) consisting of p110$\gamma$ and p101 subunits (PDB accession: 7MEZ). The structure has good agreement with ground truth bound structure conformation. Although there are mispredictions of loop secondary structures in p101 (red; top left helical loop; mispredicted as alpha helix rather than loop) subunit, the interface binding pose between p101 and p110$\gamma$ (blue) is well predicted, implying conserved binding interface in evolution. Also, rest of the secondary structures and overall heterodimer structure of the predicted conformational ensemble are comparable with ground truth structure.
  • Figure 4: Protein structure used to test APACE: a pentameric $\textrm{GABA}_{\textrm{A}}$ receptor (PDB accession: 6D6U). We show one predicted heteropentamer structure of neurotransmitter $\textrm{GABA}_{\textrm{A}}$ receptor. The left panel shows a comparable structure with ground truth predictions. Blue and gray chains form a homodimer while red and orange chains form the other homodimer. Yellow chain is a monomer differing in sequence from other two homodimers. However, the location of transmembrane helices (towards the paper direction) does not exactly reproduce the ground truth structure. This is understandable since APACE does not use membrane as an input to predict the transmembrane domain. However, the overall structure is comparable with the ground truth. On the other hand, in the right panel, we see an AI predicted protein whose structure is erroneous, and where blue and gray chains bind to each other. This structure may have high thermodynamic instability and steric hindrance when being crystallized.