TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

Kun Wu; Zixu Wang; Xiulong Yang; Yangyang Chen; Zhenqi Han; Jialu Zhang; Lizhuang Liu

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

Kun Wu, Zixu Wang, Xiulong Yang, Yangyang Chen, Zhenqi Han, Jialu Zhang, Lizhuang Liu

TL;DR

TransMA introduces an explainable, multi-modal approach to predict LNP transfection efficiency by fusing 3D geometric features with 1D SMILES representations through a molecule-level attention mechanism. The framework combines a 3D Transformer pre-trained on large molecular data with a state-space SMILES model (Mamba), and a mol-attention block that yields atom-level explanations while optimizing with a hybrid loss including a triplet term. It achieves state-of-the-art performance on scaffold and cliff splits across Hela and RAW 264.7 cell lines and reveals that high-attention atoms correspond to key differences in transfection cliffs, providing actionable interpretability. External-data evaluation further demonstrates robust generalization, with predicted orders aligning with actual transfection efficiencies, suggesting practical utility for LNP design and initial screening.

Abstract

As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction model, called TransMA. TransMA employs a multi-modal molecular structure fusion architecture, wherein the fine-grained atomic spatial relationship extractor named molecule 3D Transformer captures three-dimensional spatial features of the molecule, and the coarse-grained atomic sequence extractor named molecule Mamba captures one-dimensional molecular features. We design the mol-attention mechanism block, enabling it to align coarse and fine-grained atomic features and captures relationships between atomic spatial and sequential structures. TransMA achieves state-of-the-art performance in predicting transfection efficiency using the scaffold and cliff data splitting methods on the current largest LNPs dataset, including Hela and RAW cell lines. Moreover, we find that TransMA captures the relationship between subtle structural changes and significant transfection efficiency variations, providing valuable insights for LNPs design. Additionally, TransMA's predictions on external transfection efficiency data maintain a consistent order with actual transfection efficiencies, demonstrating its robust generalization capability. The code, model and data are made publicly available at https://github.com/wklix/TransMA/tree/master. We hope that high-accuracy transfection prediction models in the future can aid in LNPs design and initial screening, thereby assisting in accelerating the mRNA design process.

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

TL;DR

Abstract

Paper Structure (17 sections, 12 equations, 9 figures, 2 tables)

This paper contains 17 sections, 12 equations, 9 figures, 2 tables.

Introduction
Materials and methods
The method of TransMA to predict LNPs transfection efficiency
Molecule 3D Transformer
Molecule Mamba
Mol-attention mechanism block
Loss function
Experimental Section
Dataset
Experimental processing
Comparison with representative deep learning-based molecular property prediction models
Ablation Experiment
Analysis of model interpretability
Transfection cliffs
Model interpretability
...and 2 more sections

Figures (9)

Figure 1: The model takes as input multimodal structural information of ionizable lipids, including three-dimensional structural details: atomic type sequences, three-dimensional coordinates, distance matrices, bond type matrices, and one-dimensional SMILES representation molecular structural information. These two types of structural information undergo feature extraction separately using self-attention mechanism models and spatial state models. The resulting features are fused and visualized through a mol-attention mechanism to reveal the atoms that have a significant impact on transfection efficiency prediction.
Figure 2: Left:The architecture of the molecule 3D Transformer adopts a pre-training and fine-tuning approach. The pre-training task involves masked language modeling and noise coordinate prediction. Fine-tuning is conducted using molecular prediction heads for transfection efficiency prediction. Right: The input of Molecule Mamba is the SMILES representation of ionizable lipids. Through the selective scan operation and the discretized SSM state equation, high-dimensional features of the molecule are extracted.
Figure 3: The distribution of transfection efficiency dataset under two data splitting methods in Hela and RAW 264.7 cell lines is as follows: the outer ring of the pie chart represents the training set distribution, the middle ring represents the validation set distribution, and the inner ring represents the test set distribution.(A) The distribution of transfection efficiency dataset under the scaffold-based train-val-test split in Hela cell line . (B) The distribution of transfection efficiency dataset under the cliff-based train-val-test split in Hela cell line. (C) The distribution of transfection efficiency dataset in Hela cell line. (D) The distribution of transfection efficiency dataset under the scaffold-based train-val-test split in RAW 264.7 cell line. (E) The distribution of transfection efficiency dataset under the cliff-based train-val-test split in RAW 264.7 cell line. (F) The distribution of transfection efficiency dataset in RAW 264.7 cell line.
Figure 4: Comparison of TransMA performance in predicting transfection efficiency with five different models under the scaffold and cliff data splitting methods for Hela and RAW 264.7 cell lines. (A) The box plots comparing TransMA with five models in Hela cell line. (B) The box plots comparing TransMA with five models in RAW 264.7 cell line.
Figure 5: TransMA multi-modal feature fusion process based on molecule 3D Transformer and molecule Mamba under the scaffold and cliff data splitting methods in Hela and RAW 264.7 cell lines. (A)The UMAP plot of the feature fusion process under scaffold data splitting in Hela cell line. (B)The UMAP plot of the feature fusion process under cliff data splitting in Hela cell line. (C)The UMAP plot of the feature fusion process under scaffold data splitting in RAW 264.7 cell lines. (D)The UMAP plot of the feature fusion process under scaffold data splitting in RAW 264.7 cell line.
...and 4 more figures

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

TL;DR

Abstract

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

Authors

TL;DR

Abstract

Table of Contents

Figures (9)