RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness

Sisi Yuan; Jiehuang Chen; Junchuang Cai; Dong Xu; Xueliang Li; Zexuan Zhu; Junkai Ji

RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness

Sisi Yuan, Jiehuang Chen, Junchuang Cai, Dong Xu, Xueliang Li, Zexuan Zhu, Junkai Ji

TL;DR

RIGA-Fold tackles protein inverse folding by fusing geometry-aware learning with evolutionary priors to overcome local receptive-field limits and single-pass inference. The framework introduces a Geometric Attention Update with Edge-as-Key keys and a Global Context Bridge to address long-range dependencies, and extends to an enhanced RIGA-Fold* that uses dual-stream priors from ESM-2/ESM-IF in a cascaded recycling loop. Empirical results on CATH 4.2, TS50, and TS500 demonstrate strong sequence recovery and structural consistency, with RIGA-Fold* achieving state-of-the-art performance. The approach offers a practical path toward robust de novo protein design by integrating geometric, semantic, and iterative refinement components, albeit with higher inference latency due to recycling.

Abstract

Protein inverse folding, the task of predicting amino acid sequences for desired structures, is pivotal for de novo protein design. However, existing GNN-based methods typically suffer from restricted receptive fields that miss long-range dependencies and a "single-pass" inference paradigm that leads to error accumulation. To address these bottlenecks, we propose RIGA-Fold, a framework that synergizes Recurrent Interaction with Geometric Awareness. At the micro-level, we introduce a Geometric Attention Update (GAU) module where edge features explicitly serve as attention keys, ensuring strictly SE(3)-invariant local encoding. At the macro-level, we design an attention-based Global Context Bridge that acts as a soft gating mechanism to dynamically inject global topological information. Furthermore, to bridge the gap between structural and sequence modalities, we introduce an enhanced variant, RIGA-Fold*, which integrates trainable geometric features with frozen evolutionary priors from ESM-2 and ESM-IF via a dual-stream architecture. Finally, a biologically inspired ``predict-recycle-refine'' strategy is implemented to iteratively denoise sequence distributions. Extensive experiments on CATH 4.2, TS50, and TS500 benchmarks demonstrate that our geometric framework is highly competitive, while RIGA-Fold* significantly outperforms state-of-the-art baselines in both sequence recovery and structural consistency.

RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness

TL;DR

Abstract

Paper Structure (41 sections, 6 theorems, 19 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 41 sections, 6 theorems, 19 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Method
Graph Construction and Featurization
The Structure Encoder Layer
Geometric Attention Update (GAU)
Dynamic Edge Update
Global Context Bridge
Dual-Stream Fusion with Pretrained Priors
Iterative Self-Correction Strategy
Objective Function
Experiments
Benchmarks, Metrics, and Experimental Setup
Main Results on CATH and Generalization Benchmarks
Robustness Across Varying Protein Lengths
...and 26 more sections

Key Result

Lemma 3.1

Let $a_{ij}=\mathrm{softmax}_j(s_{ij})$. Then for any neighbor $j$ with a reverse edge $(j,i)$, the sensitivity of the return probability product is: where $\phi^{(j)}_i = a_{ji}(1-a_{ji})$ is the softmax slope.

Figures (4)

Figure 1: Overall architecture of RIGA-Fold and its enhanced variant RIGA-Fold*.(Left) The RIGA-Fold* Framework. This macro-level system incorporates the RIGA-Fold structure encoder (inner panel) alongside a PLM-based dual-stream module. A sequence recycling feedback loop is implemented to iteratively refine the predicted sequence $S_{t-1}$. (Right) The GAU Mechanism. A detailed view of the core interaction within RIGA-Fold, where explicit geometric edge features ($h_E$) drive the attention update to ensure SE(3)-invariant information flow.
Figure 2: Performance analysis across different sequence lengths. Results on CATH 4.2 (left column) and TS500 (right column). Top row shows Perplexity (lower is better), bottom row shows Recovery Rate (higher is better). RIGA-Fold* consistently outperforms baselines across all length intervals.
Figure 3: Qualitative comparison on the short-chain target 2avp.A.(Left) 3D structure visualization where residues correctly predicted by RIGA-Fold* but missed by ScFold are highlighted in red sticks. This confirms that our model effectively captures the hydrophobic core packing constraints. (Right) Full sequence comparison. The sequence is split into two parts for visualization. RIGA-Fold* achieves a high recovery rate of 88.2% with only minor errors, whereas baselines (ScFold, PiFold) fail to reconstruct significant portions of the sequence (marked in red).
Figure 4: Structural validity verification using AlphaFold3. The native structures are shown in green, while the backbones predicted by the base RIGA-Fold and the enhanced RIGA-Fold* are shown in gray and purple, respectively. RIGA-Fold* consistently achieves higher sequence recovery and lower RMSD compared to the base model, particularly on complex all-$\beta$ topologies.

Theorems & Definitions (8)

Lemma 3.1: Softmax product sensitivity
Lemma 3.2: Reverse-logit contraction under directional update
proof
Proposition 3.3: Layerwise contraction of $r^l$
Theorem 4.1: Resistance Monotonicity
proof
Proposition 4.2: Two-hop effective path
Theorem 5.1: Monotone improvement under recycling

RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness

TL;DR

Abstract

RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)