Table of Contents
Fetching ...

Understanding protein function with a multimodal retrieval-augmented foundation model

Timothy Fei Truong, Tristan Bepler

TL;DR

This work introduces PoET-2, a multimodal, retrieval-augmented protein foundation model that incorporates in-context learning of family-specific evolutionary constraints with optional structure conditioning to learn generative distributions over protein sequences.

Abstract

Protein language models (PLMs) learn probability distributions over natural protein sequences. By learning from hundreds of millions of natural protein sequences, protein understanding and design capabilities emerge. Recent works have shown that scaling these models improves structure prediction, but does not seem to improve mutation understanding and representation quality for protein function prediction. We introduce PoET-2, a multimodal, retrieval-augmented protein foundation model that incorporates in-context learning of family-specific evolutionary constraints with optional structure conditioning to learn generative distributions over protein sequences. PoET-2 uses a hierarchical transformer encoder that is equivariant to sequence context ordering and a dual decoder architecture with both causal and masked language modeling objectives, allowing PoET-2 to operate in both fully generative and bidirectional representation learning modes. PoET-2 achieves state-of-the-art performance on zero-shot variant effect prediction, excelling at scoring variants with multiple mutations and challenging indel mutations. In supervised settings, PoET-2 embeddings outperform previous methods for learning sequence-function relationships, especially with small datasets. This work highlights the benefits of combining retrieval augmentation with multimodal, family-centric modeling for advancing protein foundation models.

Understanding protein function with a multimodal retrieval-augmented foundation model

TL;DR

This work introduces PoET-2, a multimodal, retrieval-augmented protein foundation model that incorporates in-context learning of family-specific evolutionary constraints with optional structure conditioning to learn generative distributions over protein sequences.

Abstract

Protein language models (PLMs) learn probability distributions over natural protein sequences. By learning from hundreds of millions of natural protein sequences, protein understanding and design capabilities emerge. Recent works have shown that scaling these models improves structure prediction, but does not seem to improve mutation understanding and representation quality for protein function prediction. We introduce PoET-2, a multimodal, retrieval-augmented protein foundation model that incorporates in-context learning of family-specific evolutionary constraints with optional structure conditioning to learn generative distributions over protein sequences. PoET-2 uses a hierarchical transformer encoder that is equivariant to sequence context ordering and a dual decoder architecture with both causal and masked language modeling objectives, allowing PoET-2 to operate in both fully generative and bidirectional representation learning modes. PoET-2 achieves state-of-the-art performance on zero-shot variant effect prediction, excelling at scoring variants with multiple mutations and challenging indel mutations. In supervised settings, PoET-2 embeddings outperform previous methods for learning sequence-function relationships, especially with small datasets. This work highlights the benefits of combining retrieval augmentation with multimodal, family-centric modeling for advancing protein foundation models.

Paper Structure

This paper contains 67 sections, 5 equations, 8 figures, 22 tables, 5 algorithms.

Figures (8)

  • Figure 1: PoET-2 architecture and framework for zero-shot and supervised variant effect prediction. PoET-2 encodes a set of evolutionarily relevant proteins with an equivariant encoder, and decodes proteins with either of two decoders. Log-likelihoods from the autoregressive decoder are used for zero-shot prediction, and are combined with embeddings from the bidirectional decoder for supervised prediction.
  • Figure 2: Impact of training set size on the performance of Gaussian Process (GP) models leveraging various foundation models, evaluated on the supervised DMS substitutions benchmark.
  • Figure 3: Structure-based attention bias
  • Figure 4: Visualization of insertion decoding scheme.
  • Figure 5: Plot of log likelihood vs length for random UniRef50 protein families.
  • ...and 3 more figures