Table of Contents
Fetching ...

Technical Report of HelixFold3 for Biomolecular Structure Prediction

Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Jie Gao, Wenlai Zhao, Hongkun Yu, Zhihua Wu, Xiaonan Zhang, Xiaomin Fang

TL;DR

HelixFold3 targets a replication of AlphaFold3's biomolecular structure prediction capabilities, including ligands, nucleic acids, and protein complexes. Built on prior PaddleHelix work and training on pre-2021 PDB data with self-distillation, it achieves competitive accuracy and is released open-source for academia with an online visualization and API service. Comprehensive benchmarking across PoseBusters, CASP15 RNA targets, PDB/SAbDab protein complexes, and covalent modifications shows HelixFold3 matching or surpassing several baselines and even AlphaFold3 in specific tasks, while remaining behind in some protein–protein scenarios. The work underscores the potential of accessible, diffusion-based, open pipelines to accelerate biomolecular discovery, with ongoing improvements and wider data coverage planned.

Abstract

The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predictions, AlphaFold3 remains partially accessible through a limited online server and has not been open-sourced, restricting further development. To address these challenges, the PaddleHelix team is developing HelixFold3, aiming to replicate AlphaFold3's capabilities. Leveraging insights from previous models and extensive datasets, HelixFold3 achieves accuracy comparable to AlphaFold3 in predicting the structures of the conventional ligands, nucleic acids, and proteins. The initial release of HelixFold3 is available as open source on GitHub for academic research, promising to advance biomolecular research and accelerate discoveries. The latest version will be continuously updated on the HelixFold3 web server, providing both interactive visualization and API access.

Technical Report of HelixFold3 for Biomolecular Structure Prediction

TL;DR

HelixFold3 targets a replication of AlphaFold3's biomolecular structure prediction capabilities, including ligands, nucleic acids, and protein complexes. Built on prior PaddleHelix work and training on pre-2021 PDB data with self-distillation, it achieves competitive accuracy and is released open-source for academia with an online visualization and API service. Comprehensive benchmarking across PoseBusters, CASP15 RNA targets, PDB/SAbDab protein complexes, and covalent modifications shows HelixFold3 matching or surpassing several baselines and even AlphaFold3 in specific tasks, while remaining behind in some protein–protein scenarios. The work underscores the potential of accessible, diffusion-based, open pipelines to accelerate biomolecular discovery, with ongoing improvements and wider data coverage planned.

Abstract

The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predictions, AlphaFold3 remains partially accessible through a limited online server and has not been open-sourced, restricting further development. To address these challenges, the PaddleHelix team is developing HelixFold3, aiming to replicate AlphaFold3's capabilities. Leveraging insights from previous models and extensive datasets, HelixFold3 achieves accuracy comparable to AlphaFold3 in predicting the structures of the conventional ligands, nucleic acids, and proteins. The initial release of HelixFold3 is available as open source on GitHub for academic research, promising to advance biomolecular research and accelerate discoveries. The latest version will be continuously updated on the HelixFold3 web server, providing both interactive visualization and API access.
Paper Structure (8 sections, 5 figures)

This paper contains 8 sections, 5 figures.

Figures (5)

  • Figure 1: Results for ligands.
  • Figure 2: Results for nucleic acid targets.
  • Figure 3: Results for protein targets.
  • Figure 4: Results for covalent modification
  • Figure 5: Model confidence scores of HelixFold3.