Table of Contents
Fetching ...

FoldToken2: Learning compact, invariant and generative protein structure language

Zhangyang Gao, Cheng Tan, Stan Z. Li

TL;DR

FoldToken2 addresses the challenge of representing SE-(3) protein structures with an invariant latent language by learning a compact, discretized embedding through an invariant encoder (BlockGAT), a vector-quantized compressor (SoftCVQ/T-SoftCVQ), and an equivariant SE-(3) decoder. The framework introduces a frame-based block graph, frame-aware quantization, and iterative SE-(3) frame refinement to reconstruct 3D coordinates from discrete tokens. Empirically, FoldToken2 achieves substantial gains over FoldToken1 in single-chain reconstruction (TMScore improvements by around 20% and RMSD improvements by roughly 81%), and demonstrates strong generalization to multi-chain complexes with remarkable training efficiency and a small model footprint. These results suggest a practical, scalable path toward invariant structure representations that can fuel subsequent advances in structure alignment and generation, with potential extensions to TokenFlow and FoldGPT for sequence-to-structure tasks.

Abstract

The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structures. From FoldToken1 to FoldToken2, we improve three key components: (1) invariant structure encoder, (2) vector-quantized compressor, and (3) equivalent structure decoder. We evaluate FoldToken2 on the protein structure reconstruction task and show that it outperforms previous FoldToken1 by 20\% in TMScore and 81\% in RMSD. FoldToken2 probably be the first method that works well on both single-chain and multi-chain protein structures quantization. We believe that FoldToken2 will inspire further improvement in protein structure representation learning, structure alignment, and structure generation tasks.

FoldToken2: Learning compact, invariant and generative protein structure language

TL;DR

FoldToken2 addresses the challenge of representing SE-(3) protein structures with an invariant latent language by learning a compact, discretized embedding through an invariant encoder (BlockGAT), a vector-quantized compressor (SoftCVQ/T-SoftCVQ), and an equivariant SE-(3) decoder. The framework introduces a frame-based block graph, frame-aware quantization, and iterative SE-(3) frame refinement to reconstruct 3D coordinates from discrete tokens. Empirically, FoldToken2 achieves substantial gains over FoldToken1 in single-chain reconstruction (TMScore improvements by around 20% and RMSD improvements by roughly 81%), and demonstrates strong generalization to multi-chain complexes with remarkable training efficiency and a small model footprint. These results suggest a practical, scalable path toward invariant structure representations that can fuel subsequent advances in structure alignment and generation, with potential extensions to TokenFlow and FoldGPT for sequence-to-structure tasks.

Abstract

The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structures. From FoldToken1 to FoldToken2, we improve three key components: (1) invariant structure encoder, (2) vector-quantized compressor, and (3) equivalent structure decoder. We evaluate FoldToken2 on the protein structure reconstruction task and show that it outperforms previous FoldToken1 by 20\% in TMScore and 81\% in RMSD. FoldToken2 probably be the first method that works well on both single-chain and multi-chain protein structures quantization. We believe that FoldToken2 will inspire further improvement in protein structure representation learning, structure alignment, and structure generation tasks.
Paper Structure (32 sections, 13 equations, 6 figures, 2 tables)

This paper contains 32 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The overall framework of FoldTokenizer2, which contains contains encoder, quantifier, and decoder. In FoldToken2, we use BlockGAT to encoder protein structures as invariant embeddings, SoftCVQ to quantize the embeddings into discrete tokens, and SE-(3) layer to recover the protein structures iteratively.
  • Figure 2: Reconstruction performance without VQ. "SAE" means structure autoencoder without VQ.
  • Figure 3: Single-chain reconstruction. Grey and colored residues represent the ground truth and predicted ones.
  • Figure 4: Multi-chain reconstruction. Grey and colored residues represent the ground truth and predicted ones.
  • Figure 5: Flow matching for sequence-srtructure translation.
  • ...and 1 more figures