Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Zhanghan Ni; Yanjing Li; Zeju Qiu; Bernhard Schölkopf; Hongyu Guo; Weiyang Liu; Shengchao Liu

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Zhanghan Ni, Yanjing Li, Zeju Qiu, Bernhard Schölkopf, Hongyu Guo, Weiyang Liu, Shengchao Liu

TL;DR

RigidSSL is introduced, a geometric pretraining framework that front-loads geometry learning prior to generative finetuning and improves designability by up to 43% while enhancing novelty and diversity in unconditional generation.

Abstract

Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce $\textbf{RigidSSL}$ ($\textit{Rigidity-Aware Self-Supervised Learning}$), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations. Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions. Underpinning both phases is a bi-directional, rigidity-aware flow matching objective that jointly optimizes translational and rotational dynamics to maximize mutual information between conformations. Empirically, RigidSSL variants improve designability by up to 43\% while enhancing novelty and diversity in unconditional generation. Furthermore, RigidSSL-Perturb improves the success rate by 5.8\% in zero-shot motif scaffolding and RigidSSL-MD captures more biophysically realistic conformational ensembles in G protein-coupled receptor modeling. The code is available at: https://github.com/ZhanghanNi/RigidSSL.git.

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

TL;DR

Abstract

Generative models have recently advanced

protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce

(

), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations. Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions. Underpinning both phases is a bi-directional, rigidity-aware flow matching objective that jointly optimizes translational and rotational dynamics to maximize mutual information between conformations. Empirically, RigidSSL variants improve designability by up to 43\% while enhancing novelty and diversity in unconditional generation. Furthermore, RigidSSL-Perturb improves the success rate by 5.8\% in zero-shot motif scaffolding and RigidSSL-MD captures more biophysically realistic conformational ensembles in G protein-coupled receptor modeling. The code is available at: https://github.com/ZhanghanNi/RigidSSL.git.

Paper Structure (44 sections, 34 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 44 sections, 34 equations, 4 figures, 8 tables, 1 algorithm.

Introduction
Preliminaries
Method: RigidSSL
Reference Frame Canonicalization
View Constructions in a Two-Phase Pretraining Framework
Phase I: RigidSSL-Perturb
Phase II: RigidSSL-MD
Rigid Flow Matching for Multi-view Pretraining: RigidSSL
Experiments and Results
Downstream: Unconditional Protein Structure Generation
Case study: Zero-shot motif scaffolding
Case study: GPCR Conformational Ensemble Generation
Discussion
Appendix
Related Works
...and 29 more sections

Figures (4)

Figure 1: Overview of RigidSSL. (a) View construction in RigidSSL-Perturb: translational noise in $\mathbb{R}^3$ and rotational noise in $\operatorname{SO}(3)$ are applied to generate perturbations in the rigid body motion group $\mathrm{SE}(3)$. (b) View construction in RigidSSL-MD: perturbed states are obtained by sampling conformational frames from MD trajectories. (c) Rigidity-based pretraining in RigidSSL: proteins are canonicalized into a reference frame, intermediate states are constructed via interpolation of translations and rotations for each rigid residue frame, and bi-directional flow matching is applied for pretraining. Details can be found in \ref{['Sec: Methods']}.
Figure 2: Distribution of secondary structure elements ($\alpha$-helices, $\beta$-sheets, and coils) in protein structure database (a-b) and in designable proteins (scRMSD $\leq$ 2.0 Å) generated by FoldFlow-2 under different pretraining methods (c-h). Plots of the structure database are color-coded by sequence length, whereas those of the generated structures are color-coded by scRMSD.
Figure 3: FoldFlow-2 generated structures (orange) compared against ProteinMPNN $\rightarrow$ ESMFold refolded structures (grey). Columns denote pretraining methods, and rows denote sequence lengths of 700 and 800.
Figure 4: Impact of translation and rotation noise scale on protein structure validity.

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

TL;DR

Abstract

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Authors

TL;DR

Abstract

Table of Contents

Figures (4)