Table of Contents
Fetching ...

Martini Mapper: An Automated Fragment-Based Framework for Developing Coarse-Grained Models within the Martini 3 Framework

Kevin V. Bigting, Shubhadeep Nag, Yaxin An

TL;DR

This work tackles the challenge of efficiently deriving accurate and transferable Martini 3 coarse-grained models for chemically diverse molecules by introducing Martini Mapper, an automated, fragment-based framework that maps SMILES to Martini 3 representations using a curated LBBT and an expanding DBBT. The pipeline tokenizes SMILES, applies a ring-first hierarchical mapping under a path-length constraint, and produces simulation-ready topologies and coordinates, scalable to molecules with over $120$ heavy atoms. Validation across multiple datasets shows strong agreement with experimental $ΔG_{OW}$ and $\log P$ for large portions of chemical space, with substantial gains from iterative bead refinement and dictionary expansion. The framework enables high-throughput, reproducible CG modeling relevant to drug discovery and materials design, while acknowledging current limitations of Martini 3 and dictionary coverage, and outlining clear paths toward broader coverage and potential ML-assisted optimization."

Abstract

Coarse-graining (CG) reduces molecular details to extend the time and length scales of molecular dynamics simulations to microseconds and micrometers. However, the CG approaches have long been limited by the difficulty of constructing both accurate and transferable models efficiently, considering the large diversity of chemical structures of materials. Among CG force fields, Martini is the most widely used, as it retains essential chemical features while offering substantial computational efficiency. Its most recent version, Martini 3, expands chemical resolution through a much broader bead set, particularly for small molecules. However, this flexibility also complicates the mapping of organic molecules because of context-dependent rules and the lack of standardized procedures. To address this issue, we present an automated framework that builds Martini 3 models directly from SMILES (Simplified Molecular Input Line Entry System) strings by combining a curated bead dictionary with a hierarchical, rule-based algorithm. Our framework, Martini Mapper, generated Martini 3 models for more than 5,000 molecules across four chemically diverse datasets. A curated subset of 1,081 mapped structures was benchmarked through octanol-water free-energy ($ΔG_{OW}$) and partition-coefficient ($\log P$) calculations, yielding strong agreement with experimental values. The workflow can also map large molecules containing up to 126 heavy atoms, exceeding the capabilities of existing automated approaches. The algorithm and the complete set of more than 5,000 mapped itp/top files are available at the \href{https://github.com/eliobaby/Martini_mapping}{Martini Mapper}. Our framework, therefore, enables systematic and scalable Martini 3 structures for high-throughput simulations relevant to drug discovery and materials design.

Martini Mapper: An Automated Fragment-Based Framework for Developing Coarse-Grained Models within the Martini 3 Framework

TL;DR

This work tackles the challenge of efficiently deriving accurate and transferable Martini 3 coarse-grained models for chemically diverse molecules by introducing Martini Mapper, an automated, fragment-based framework that maps SMILES to Martini 3 representations using a curated LBBT and an expanding DBBT. The pipeline tokenizes SMILES, applies a ring-first hierarchical mapping under a path-length constraint, and produces simulation-ready topologies and coordinates, scalable to molecules with over heavy atoms. Validation across multiple datasets shows strong agreement with experimental and for large portions of chemical space, with substantial gains from iterative bead refinement and dictionary expansion. The framework enables high-throughput, reproducible CG modeling relevant to drug discovery and materials design, while acknowledging current limitations of Martini 3 and dictionary coverage, and outlining clear paths toward broader coverage and potential ML-assisted optimization."

Abstract

Coarse-graining (CG) reduces molecular details to extend the time and length scales of molecular dynamics simulations to microseconds and micrometers. However, the CG approaches have long been limited by the difficulty of constructing both accurate and transferable models efficiently, considering the large diversity of chemical structures of materials. Among CG force fields, Martini is the most widely used, as it retains essential chemical features while offering substantial computational efficiency. Its most recent version, Martini 3, expands chemical resolution through a much broader bead set, particularly for small molecules. However, this flexibility also complicates the mapping of organic molecules because of context-dependent rules and the lack of standardized procedures. To address this issue, we present an automated framework that builds Martini 3 models directly from SMILES (Simplified Molecular Input Line Entry System) strings by combining a curated bead dictionary with a hierarchical, rule-based algorithm. Our framework, Martini Mapper, generated Martini 3 models for more than 5,000 molecules across four chemically diverse datasets. A curated subset of 1,081 mapped structures was benchmarked through octanol-water free-energy () and partition-coefficient () calculations, yielding strong agreement with experimental values. The workflow can also map large molecules containing up to 126 heavy atoms, exceeding the capabilities of existing automated approaches. The algorithm and the complete set of more than 5,000 mapped itp/top files are available at the \href{https://github.com/eliobaby/Martini_mapping}{Martini Mapper}. Our framework, therefore, enables systematic and scalable Martini 3 structures for high-throughput simulations relevant to drug discovery and materials design.

Paper Structure

This paper contains 17 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The flowchart of our automated coarse-grained mapping pipeline. The process begins with SMILES input, proceeds through preprocessing (tokenization, graph construction, and mapping array generation), applies the hierarchical bead assignment algorithm, and outputs simulation-ready coordinate (.gro) and topology (.itp) files.
  • Figure 2: (a) Chemical structure of methyl 3-furancarboxylate. (b) Tokenization of the canonical SMILES string ( COC(=O)C1=COC=C1) into atomic and structural symbols. Mapping Scheme: (c) Partitioning of the molecule into ring (blue) and non-ring (yellow) sections according to the algorithm. (d) Final bead assignments: the aromatic ring is mapped into TC5 and TN3a beads, while the ester side chain is mapped into an N5a bead.
  • Figure 3: Mapping of representative molecules. (a) Quinoline is treated by the algorithm as two ring sections with a ring-fusion point, which is mapped to TC5e, while the remaining parts of the molecule are mapped to TC5 and TN6a according to their chemical structure (C=C to TC5 and C=N to TN6a) (b) 2-Methyl-2-butene, is mapped to C4, here path length, $l$ is equal to $3$. (c) Isopropyl acetate is treated by the algorithm as a single non-ring fragment, and is mapped into two beads (SN2 and N2), when $l > 3$. (d) Acetyl-L-alanine amide is mapped into three beads (SP3a, SN5, and SN2), for $l > 3$. (e) 2,2,2-Trifluoroethanol is mapped into two beads (TP1d, SX4e). (f) Anisole is mapped into TN4a and TC5. The coloring of the beads is only tailored to each image.
  • Figure 4: The distribution of successfully mapped molecules with varying heavy-atom counts using DBBT is shown in blue for all three datasets: Bereau, Kaggle and 2D dataset, and in orange separately for the TPCN dataset.
  • Figure 5: Comparison of simulated versus experimental $\Delta G_{OW}$ values for the 481 molecules from the Bereau dataset before (a) and after (b) refinement. Prediction accuracy improved over seven refinement cycles using bead-level error decomposition and reparameterization. RMSE decreased from 1.47 to 0.99, and $R^2$ increased from 0.62 to 0.83. Red diagonal lines indicate ideal agreement.
  • ...and 2 more figures