Table of Contents
Fetching ...

Multi-modal Transfer Learning between Biological Foundation Models

Juan Jose Garau-Luis, Patrick Bordes, Liam Gonzalez, Masa Roller, Bernardo P. de Almeida, Lorenz Hexemer, Christopher Blum, Stefan Laurent, Jan Grzegorzewski, Maren Lang, Thomas Pierrot, Guillaume Richard

TL;DR

A multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders is proposed, able to accurately predict differential transcript expression, outperforming existing methods and leveraging the use of multiple modalities.

Abstract

Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same gene (i.e. same DNA sequence) and map to different transcription expression levels across various human tissues. We show that our model, dubbed IsoFormer, is able to accurately predict differential transcript expression, outperforming existing methods and leveraging the use of multiple modalities. Our framework also achieves efficient transfer knowledge from the encoders pre-training as well as in between modalities. We open-source our model, paving the way for new multi-modal gene expression approaches.

Multi-modal Transfer Learning between Biological Foundation Models

TL;DR

A multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders is proposed, able to accurately predict differential transcript expression, outperforming existing methods and leveraging the use of multiple modalities.

Abstract

Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same gene (i.e. same DNA sequence) and map to different transcription expression levels across various human tissues. We show that our model, dubbed IsoFormer, is able to accurately predict differential transcript expression, outperforming existing methods and leveraging the use of multiple modalities. Our framework also achieves efficient transfer knowledge from the encoders pre-training as well as in between modalities. We open-source our model, paving the way for new multi-modal gene expression approaches.
Paper Structure (20 sections, 5 equations, 6 figures, 7 tables)

This paper contains 20 sections, 5 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Our aggregation module compiles information from the different biological sequence modalities of DNA, RNA, and proteins by using successive cross-attention layers and residual connections.
  • Figure 2: a) Three types of biological sequences are considered in this work: DNA, RNA, and proteins. These sequences are composed of nucleotides (DNA and RNA) or amino-acids (protein). In a single gene, several coding regions or exons can be used to create different RNA transcript isoforms and proteins. The abundance of each isoform is tissue-dependent and its measurement is called expression level. b) IsoFormer leverages pre-trained encoders that produce modality-specific embeddings, which are then aggregated into multi-modal embeddings. These are used to predict the expression of a given RNA transcript isoform across multiple tissues.
  • Figure 3: Left: Performance of IsoFormer and Enformer avsec2021effective per tissue on a selected subset of tissues. Right: Changes in attention in the RNA encoder during fine-tuning. These scores are reported for three genomics elements of interest for all heads and layers of the RNA encoder.
  • Figure 4: Different aggregation strategies compared during the ablation studies. The figures show the specific case for obtaining multi-modal DNA embeddings $\mathbf{h}_{\text{dna}}^{\prime}$; the same structure is used to obtain multi-modal RNA and protein embeddings ($\mathbf{h}_{\text{rna}}^{\prime}$ and $\mathbf{h}_{\text{prot}}^{\prime}$, respectively). In all cases, the Resampler module is a Perceiver Resamplerjaegle2021perceiver and the Mean pooling block is the Adaptive mean pooling operator used in liu2024visual.
  • Figure 5: Average and standard deviation of expression values across transcripts per tissue.
  • ...and 1 more figures