Martini Mapper: An Automated Fragment-Based Framework for Developing Coarse-Grained Models within the Martini 3 Framework
Kevin V. Bigting, Shubhadeep Nag, Yaxin An
TL;DR
This work tackles the challenge of efficiently deriving accurate and transferable Martini 3 coarse-grained models for chemically diverse molecules by introducing Martini Mapper, an automated, fragment-based framework that maps SMILES to Martini 3 representations using a curated LBBT and an expanding DBBT. The pipeline tokenizes SMILES, applies a ring-first hierarchical mapping under a path-length constraint, and produces simulation-ready topologies and coordinates, scalable to molecules with over $120$ heavy atoms. Validation across multiple datasets shows strong agreement with experimental $ΔG_{OW}$ and $\log P$ for large portions of chemical space, with substantial gains from iterative bead refinement and dictionary expansion. The framework enables high-throughput, reproducible CG modeling relevant to drug discovery and materials design, while acknowledging current limitations of Martini 3 and dictionary coverage, and outlining clear paths toward broader coverage and potential ML-assisted optimization."
Abstract
Coarse-graining (CG) reduces molecular details to extend the time and length scales of molecular dynamics simulations to microseconds and micrometers. However, the CG approaches have long been limited by the difficulty of constructing both accurate and transferable models efficiently, considering the large diversity of chemical structures of materials. Among CG force fields, Martini is the most widely used, as it retains essential chemical features while offering substantial computational efficiency. Its most recent version, Martini 3, expands chemical resolution through a much broader bead set, particularly for small molecules. However, this flexibility also complicates the mapping of organic molecules because of context-dependent rules and the lack of standardized procedures. To address this issue, we present an automated framework that builds Martini 3 models directly from SMILES (Simplified Molecular Input Line Entry System) strings by combining a curated bead dictionary with a hierarchical, rule-based algorithm. Our framework, Martini Mapper, generated Martini 3 models for more than 5,000 molecules across four chemically diverse datasets. A curated subset of 1,081 mapped structures was benchmarked through octanol-water free-energy ($ΔG_{OW}$) and partition-coefficient ($\log P$) calculations, yielding strong agreement with experimental values. The workflow can also map large molecules containing up to 126 heavy atoms, exceeding the capabilities of existing automated approaches. The algorithm and the complete set of more than 5,000 mapped itp/top files are available at the \href{https://github.com/eliobaby/Martini_mapping}{Martini Mapper}. Our framework, therefore, enables systematic and scalable Martini 3 structures for high-throughput simulations relevant to drug discovery and materials design.
