MT-Mol:Multi Agent System with Tool-based Reasoning for Molecular Optimization
Hyomin Kim, Yunhui Jang, Sungsoo Ahn
TL;DR
MT-Mol introduces a multi-agent LLM framework for molecular optimization that grounds design in tool-guided reasoning using RDKit. By decomposing tasks into analyst, scientist, verifier, and reviewer roles, and enforcing structured, stepwise reasoning with tool-informed feedback, MT-Mol achieves state-of-the-art performance on the PMO-1K benchmark across 17 of 23 tasks while maintaining chemical interpretability. The approach demonstrates the value of explicit collaboration and domain-specific tool integration in generating chemically valid, high-quality molecules under budget-constrained settings. This framework has practical implications for transparent AI-assisted molecular design and educational dissemination of chemical design reasoning, albeit with limitations related to tooling scope and language coverage.
Abstract
Large language models (LLMs) have large potential for molecular optimization, as they can gather external chemistry tools and enable collaborative interactions to iteratively refine molecular candidates. However, this potential remains underexplored, particularly in the context of structured reasoning, interpretability, and comprehensive tool-grounded molecular optimization. To address this gap, we introduce MT-Mol, a multi-agent framework for molecular optimization that leverages tool-guided reasoning and role-specialized LLM agents. Our system incorporates comprehensive RDKit tools, categorized into five distinct domains: structural descriptors, electronic and topological features, fragment-based functional groups, molecular representations, and miscellaneous chemical properties. Each category is managed by an expert analyst agent, responsible for extracting task-relevant tools and enabling interpretable, chemically grounded feedback. MT-Mol produces molecules with tool-aligned and stepwise reasoning through the interaction between the analyst agents, a molecule-generating scientist, a reasoning-output verifier, and a reviewer agent. As a result, we show that our framework shows the state-of-the-art performance of the PMO-1K benchmark on 17 out of 23 tasks.
