Protein Autoregressive Modeling via Multiscale Structure Generation

Yanru Qu; Cheng-Yen Hsieh; Zaixiang Zheng; Ge Liu; Quanquan Gu

Protein Autoregressive Modeling via Multiscale Structure Generation

Yanru Qu, Cheng-Yen Hsieh, Zaixiang Zheng, Ge Liu, Quanquan Gu

TL;DR

Protein Autoregressive Modeling (PAR) introduces a first multi-scale autoregressive approach to protein backbone generation via coarse-to-fine next-scale prediction. It combines deterministic multi-scale downsampling, a scale-aware autoregressive transformer, and a flow-based backbone decoder to model continuous $C\alpha$ coordinates, addressing discretization and exposure bias. Key contributions include Noisy Context Learning and Scheduled Sampling to mitigate training-inference mismatch, strong zero-shot generalization for human-prompted layout and motif scaffolding, and competitive unconditional generation with efficient multi-scale sampling. PAR’s multi-scale orchestration enables a scalable, interpretable generation process that forms a global topology first and refines details, offering practical potential for protein design and flexible conditioning without fine-tuning.

Abstract

We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.

Protein Autoregressive Modeling via Multiscale Structure Generation

TL;DR

coordinates, addressing discretization and exposure bias. Key contributions include Noisy Context Learning and Scheduled Sampling to mitigate training-inference mismatch, strong zero-shot generalization for human-prompted layout and motif scaffolding, and competitive unconditional generation with efficient multi-scale sampling. PAR’s multi-scale orchestration enables a scalable, interpretable generation process that forms a global topology first and refines details, offering practical potential for protein design and flexible conditioning without fine-tuning.

Abstract

Paper Structure (63 sections, 7 equations, 10 figures, 13 tables)

This paper contains 63 sections, 7 equations, 10 figures, 13 tables.

Introduction
Background and Related Work
Flow and diffusion-based structure generative models.
Autoregressive modeling.
Protein Autoregressive Modeling
Multi-scale Protein Downsampling
Coarse-to-Fine Backbone Autoregressive Modeling
Autoregressive transformer for scale-wise conditioning.
Flow-based atomic decoder.
Multi-scale structure generation.
Mitigating Exposure Bias
Noisy context learning.
Scheduled sampling.
Experiments
Protein Backbone Generation
...and 48 more sections

Figures (10)

Figure 1: Overview of PAR. PAR comprises the autoregressive (AR) transformer $\mathcal{T_\theta}$ and the flow-based backbone decoder ${\mathbf{v}}_\theta$. During training, we downsample a backbone ${\mathbf{x}} \in \mathbb{R}^{L \times 3}$ into multi-scale representations $\{{\mathbf{x}}^1,\ldots,{\mathbf{x}}\}$. AR transformer performs next-scale prediction, producing conditional embeddings $({\mathbf{z}}^1, \ldots,{\mathbf{z}}^n)$ from $(\textit{bos},\ldots,{\mathbf{x}}^{n-1})$. The shared flow-based decoder learns to denoise backbones ${\mathbf{x}}^i$ at each scale conditioned on ${\mathbf{z}}^i$. At inference, PAR autoregressively generates ${\mathbf{x}}^i$ until the final structure ${\mathbf{x}}$ is constructed.
Figure 2: Samples generated by PAR over scales. We illustrate PAR's generation process across five scales. Much like sculpting a statue, the model first formulates the global structural layout at coarse scales and progressively refines the details at later scales.
Figure 3: Backbone generation with human prompt. Given a small number of points (e.g., 16) as prompt, PAR can generate protein backbones that adhere to the global arrangements specified by these points, without any finetuning. For visualization, input points are interpolated to match the length of the generated structure.
Figure 4: Zero-shot motif scaffolding. Given a motif structure, PAR can generate diverse, plausible scaffold structures that accurately preserve the motif via teacher-forcing the motif coordinates at each scale, without additional conditioning or fine-tuning.
Figure 5: Scaling effects of PAR. Performance of four metrics over varying training steps and model sizes, (a) FPSD vs. PDB, (b) FPSD vs. AFDB, (c) fS(T), (d) sc-RMSD.
...and 5 more figures

Protein Autoregressive Modeling via Multiscale Structure Generation

TL;DR

Abstract

Protein Autoregressive Modeling via Multiscale Structure Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)