Relaxed Sequence Sampling for Diverse Protein Design

Joohwan Ko; Aristofanis Rontogiannis; Yih-En Andrew Ban; Axel Elaldi; Nicholas Franklin

Relaxed Sequence Sampling for Diverse Protein Design

Joohwan Ko, Aristofanis Rontogiannis, Yih-En Andrew Ban, Axel Elaldi, Nicholas Franklin

TL;DR

Relaxed Sequence Sampling (RSS) addresses the limited diversity of single-trajectory relaxed sequence optimization by performing MCMC over continuous amino-acid logits in tandem with PLM-informed jumps. The method defines an energy $\mathcal{E}(\ell)=\mathcal{L}_{\mathrm{AF2}}(\ell)+\lambda\mathcal{L}_{\mathrm{PLM}}(\ell)$ and samples from $\pi(\ell)\propto \exp(-\beta\mathcal{E}(\ell))$ using a walk–jump kernel that blends gradient-based MALA moves with PLM-guided discrete swaps. Empirically, RSS yields roughly 5× more designable binder structures and 2–3× greater structural diversity at equal compute, and ablation confirms Soft-PLM as an accurate differentiable surrogate for discrete PLMs. These findings demonstrate a principled, scalable approach to exploring protein design landscapes by integrating structure prediction with evolutionary priors in continuous space.

Abstract

Protein design using structure prediction models such as AlphaFold2 has shown remarkable success, but existing approaches like relaxed sequence optimization (RSO) rely on single-path gradient descent and ignore sequence-space constraints, limiting diversity and designability. We introduce Relaxed Sequence Sampling (RSS), a Markov chain Monte Carlo (MCMC) framework that integrates structural and evolutionary information for protein design. RSS operates in continuous logit space, combining gradient-guided exploration with protein language model-informed jumps. Its energy function couples AlphaFold2-derived structural objectives with ESM2-derived sequence priors, balancing accuracy and biological plausibility. In an in silico protein binder design task, RSS produces 5$\times$ more designable structures and 2-3$\times$ greater structural diversity than RSO baselines, at equal computational cost. These results highlight RSS as a principled approach for efficiently exploring the protein design landscape.

Relaxed Sequence Sampling for Diverse Protein Design

TL;DR

Abstract

Relaxed Sequence Sampling for Diverse Protein Design

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (1)