Table of Contents
Fetching ...

MXtalTools: A Toolkit for Machine Learning on Molecular Crystals

Michael Kilgour, Mark E. Tuckerman, Jutta Rogal

TL;DR

MXtalTools addresses the lack of purpose-built ML tooling for molecular crystals by providing GPU-accelerated, end-to-end differentiable workflows for crystal sampling, construction, and analysis. It introduces MolData and MolCrystalData data classes, Mo3ENet molecule embeddings, and modular training utilities that support density prediction, autoencoding, and property modeling from molecular graphs, all within YAML-configurable pipelines and Weights & Biases logging. The toolkit enables crystal parameterization, differentiable crystal building, and crystal scoring (MXtalNet-S), complemented by RDF-based similarity measures and optional MLIP interfaces, all within an open-source BSD-3 framework. This combination supports high-throughput CSP-like exploration, gradient-based structure optimization, and flexible pipeline composition for molecular crystal ML research.

Abstract

We present MXtalTools, a flexible Python package for the data-driven modelling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal datasets, (2) integrated workflows for model training and inference, (3) crystal parameterization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modelling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modelling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.

MXtalTools: A Toolkit for Machine Learning on Molecular Crystals

TL;DR

MXtalTools addresses the lack of purpose-built ML tooling for molecular crystals by providing GPU-accelerated, end-to-end differentiable workflows for crystal sampling, construction, and analysis. It introduces MolData and MolCrystalData data classes, Mo3ENet molecule embeddings, and modular training utilities that support density prediction, autoencoding, and property modeling from molecular graphs, all within YAML-configurable pipelines and Weights & Biases logging. The toolkit enables crystal parameterization, differentiable crystal building, and crystal scoring (MXtalNet-S), complemented by RDF-based similarity measures and optional MLIP interfaces, all within an open-source BSD-3 framework. This combination supports high-throughput CSP-like exploration, gradient-based structure optimization, and flexible pipeline composition for molecular crystal ML research.

Abstract

We present MXtalTools, a flexible Python package for the data-driven modelling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal datasets, (2) integrated workflows for model training and inference, (3) crystal parameterization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modelling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modelling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.

Paper Structure

This paper contains 20 sections, 7 equations, 8 figures.

Figures (8)

  • Figure 1: Major components of MXtaltools.
  • Figure 2: Class summary diagram for MolData and MolCrystalData objects.
  • Figure 3: (a) Results of a crystal search run, showing the intermolecular LJ energy as a function of density (crystal packing coefficient), colors indicate the log of the RDF EMD between each sample and the experimental crystal structure, (b) RMSD and number of matched molecules for 755 structures, (c) the closest 20/20 matched cluster with RMSD of 0.204Å.
  • Figure S4: Workflows for molecule and crystal data point creation.
  • Figure S5: MXtalTools main modelling workflows.
  • ...and 3 more figures