One protein is all you need

Anton Bushuiev; Roman Bushuiev; Olga Pimenova; Nikola Zadorozhny; Raman Samusevich; Elisabet Manaskova; Rachel Seongeun Kim; Hannes Stärk; Jiri Sedlar; Martin Steinegger; Tomáš Pluskal; Josef Sivic

One protein is all you need

Anton Bushuiev, Roman Bushuiev, Olga Pimenova, Nikola Zadorozhny, Raman Samusevich, Elisabet Manaskova, Rachel Seongeun Kim, Hannes Stärk, Jiri Sedlar, Martin Steinegger, Tomáš Pluskal, Josef Sivic

TL;DR

ProteinTTT introduces per-protein customization by test-time self-supervised adaptation of the backbone in a Y-shaped protein language-model setup, targeting a single sequence $x$ to minimize perplexity while leaving the downstream head unchanged. By optimizing $f$ via masked language modeling on $x$, and selecting the best $\theta_x$ through a confidence signal (e.g., $c = p\text{LDDT}$), ProteinTTT yields improved downstream predictions across structure, fitness, and function tasks, using SGD with LoRA to scale to large models. Empirically, ProteinTTT delivers consistent gains across models and datasets, achieving new state-of-the-art results in protein fitness prediction (ProteinGym) and enhancing difficult cases in antibody–antigen loop modeling and the Big Fantastic Virus Database. The approach is practical, data-efficient (no extra data required), and extensible (MSA customization, various heads), offering a versatile tool for researchers focusing on single proteins or specific protein families.

Abstract

Generalization beyond training data remains a central challenge in machine learning for biology. A common way to enhance generalization is self-supervised pre-training on large datasets. However, aiming to perform well on all possible proteins can limit a model's capacity to excel on any specific one, whereas experimentalists typically need accurate predictions for individual proteins they study, often not covered in training data. To address this limitation, we propose a method that enables self-supervised customization of protein language models to one target protein at a time, on the fly, and without assuming any additional data. We show that our Protein Test-Time Training (ProteinTTT) method consistently enhances generalization across different models, their sizes, and datasets. ProteinTTT improves structure prediction for challenging targets, achieves new state-of-the-art results on protein fitness prediction, and enhances function prediction on two tasks. Through two challenging case studies, we also show that customization via ProteinTTT achieves more accurate antibody-antigen loop modeling and enhances 19% of structures in the Big Fantastic Virus Database, delivering improved predictions where general-purpose AlphaFold2 and ESMFold struggle.

One protein is all you need

TL;DR

Abstract

One protein is all you need

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (19)