Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2
Yeqing Lin, Minji Lee, Zhao Zhang, Mohammed AlQuraishi
TL;DR
This work introduces Genie 2, an advanced diffusion-based framework for structure- and motif-aware protein design at scale. By employing motif-conditioned, SE(3)-aware diffusion with a multi-motif scaffolding framework and large-scale AFDB augmentation, Genie 2 achieves state-of-the-art designability, diversity, and novelty in unconditional generation and demonstrates substantial capabilities for single- and multi-motif scaffolding. The approach enables designing proteins with multiple independent functional motifs and thereby expanding the design space for enzymes, biosensors, and therapeutics. While offering strong performance, Genie 2 trades off sampling speed and computational complexity, pointing to future work on faster inference and scaling to even larger proteins. The work provides a solid foundation and public-release code and weights to advance structure-based protein design research and applications.
Abstract
Protein diffusion models have emerged as a promising approach for protein design. One such pioneering model is Genie, a method that asymmetrically represents protein structures during the forward and backward processes, using simple Gaussian noising for the former and expressive SE(3)-equivariant attention for the latter. In this work we introduce Genie 2, extending Genie to capture a larger and more diverse protein structure space through architectural innovations and massive data augmentation. Genie 2 adds motif scaffolding capabilities via a novel multi-motif framework that designs co-occurring motifs with unspecified inter-motif positions and orientations. This makes possible complex protein designs that engage multiple interaction partners and perform multiple functions. On both unconditional and conditional generation, Genie 2 achieves state-of-the-art performance, outperforming all known methods on key design metrics including designability, diversity, and novelty. Genie 2 also solves more motif scaffolding problems than other methods and does so with more unique and varied solutions. Taken together, these advances set a new standard for structure-based protein design. Genie 2 inference and training code, as well as model weights, are freely available at: https://github.com/aqlaboratory/genie2.
