MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
Yoel Shoshan, Moshiko Raboh, Michal Ozery-Flato, Vadim Ratner, Alex Golts, Jeffrey K. Weber, Ella Barkan, Simona Rabinovici-Cohen, Sagi Polaczek, Ido Amos, Ben Shapira, Liam Hazan, Matan Ninio, Sivan Ravid, Michael M. Danziger, Yosi Shamay, Sharon Kurant, Joseph A. Morrone, Parthasarathy Suryanarayanan, Michal Rosen-Zvi, Efrat Hexter
TL;DR
MAMMAL introduces Molecular Aligned Multi-Modal Architecture and Language, a cross-domain foundation model that unifies proteins, small molecules, and transcriptomic data within a single encoder–decoder Transformer framework. It employs a structured, multi-domain prompt syntax and continuous scalar embeddings to support classification, regression, and generation tasks across the drug discovery pipeline, pretrained on ~$2$ billion samples from six public datasets. Across $11$ downstream benchmarks, MAMMAL achieves state-of-the-art results on $9$ tasks and remains competitive on the remaining two, demonstrating strong cross-domain transfer and task versatility. Comparative analyses with AlphaFold3 on antibody–antigen and nanobody–antigen binding show MAMMAL provides superior classification in most targets, underscoring the value of integrated, sequence-based cross-domain learning for predictive design in biomedicine. The work provides open code and pretrained weights to facilitate replication and further development in cross-domain biomedical AI.
Abstract
Large language models applied to vast biological datasets have the potential to transform biology by uncovering disease mechanisms and accelerating drug development. However, current models are often siloed, trained separately on small-molecules, proteins, or transcriptomic data, limiting their ability to capture complex, multi-modal interactions. Effective drug discovery requires computational tools that integrate multiple biological entities while supporting prediction and generation, a challenge existing models struggle to address. For this purpose, we present MAMMAL - Molecular Aligned Multi-Modal Architecture and Language - a versatile method applied to create a multi-task foundation model that learns from large-scale biological datasets across diverse modalities, including proteins, small-molecules, and omics. MAMMAL's structured prompt syntax supports classification, regression, and generation tasks while handling token and scalar inputs and outputs. Evaluated on eleven diverse downstream tasks, it reaches a new state of the art (SOTA) in nine tasks and is comparable to SOTA in two tasks, all within a unified architecture, unlike prior task-specific models. Additionally, we explored Alphafold 3 binding prediction capabilities on antibody-antigen and nanobody-antigen complexes showing significantly better classification performance of MAMMAL in 3 out of 4 targets. The model code and pretrained weights are publicly available at https://github.com/BiomedSciAI/biomed-multi-alignment and https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m
