Omni-Mol: Multitask Molecular Model for Any-to-any Modalities
Chengxin Hu, Hao Li, Yihe Yuan, Zezheng Song, Chenyang Zhao, Haixin Wang
TL;DR
Omni-Mol addresses the need for a truly generalist molecular LLM by unifying 16 small-molecule tasks across four modalities into a single framework. It introduces a large Omni-Mol dataset (~1.42M samples) and two key innovations—Gradient Adaptive LoRA (GAL) and Mixture-of-GAL-Experts (MoGE)—to balance cross-task learning and modality-specific needs. A two-stage training regime (graph-alignment followed by unified fine-tuning) plus an auxiliary load-balancing objective yields state-of-the-art results on 13 of 16 tasks and demonstrates strong data- and model-size scalability. Analyses indicate representations converge toward a universal solution space as task count grows, highlighting the potential for generalist AI chemists; however, the approach is computationally intensive and focused on small molecules, pointing to future work on larger biomolecular systems and interactions.
Abstract
In the molecular domain, numerous studies have explored the use of multimodal large language models (LLMs) to construct a general-purpose, multi-task molecular model. However, these efforts are still far from achieving a truly universal molecular model. We identify three key challenges in this endeavor: (1) Existing molecular task datasets are typically small in scale and lack comprehensive domain coverage. (2) Tasks from different molecular subfields are difficult to effectively learn jointly through LLMs due to significant distributional shifts and competition among tasks, which introduces instability in the learning process. (3) Both inter-task and intra-task molecular representations demand different intrinsic dimensions in the language space, making it challenging to balance between redundancy and insufficiency in language model representations. To address these challenges, we innovatively categorize existing small-molecule tasks into four types: Mol2Mol, Mol2Text, Mol2Num, and Text2Mol. We then collect a dataset encompassing over 16 tasks with more than 1.4 million samples, making it the largest molecular instruction-tuning dataset to date. Leveraging the extensive pretraining of LLMs on existing chemical literature, we propose a novel multimodal LLM framework, named Omni-Mol, which unifies all small-molecule tasks and supports both molecular generation and understanding. The core of Omni-Mol is our proposed MoGE, which dynamically adapts to the intrinsic rank of different tasks. This mixture-of-experts architecture enhances the model's ability to handle diverse tasks and modalities effectively. Our model achieves unified instruction tuning across 16 tasks and attains state-of-the-art performance on 13 of them. Extensive experiments further demonstrate the scalability and versatility of Omni-Mol.
