Table of Contents
Fetching ...

Mofasa: A Step Change in Metal-Organic Framework Generation

Vaidotas Simkus, Anders Christensen, Steven Bennett, Ian Johnson, Mark Neumann, James Gin, Jonathan Godwin, Benjamin Rhodes

TL;DR

Mofasa introduces an all-atom latent diffusion model capable of generating full MOF structures up to 500 atoms, overcoming prior scalability limits of all-atom methods. By jointly sampling atom types, positions, and lattice vectors within a hierarchical GNS backbone and employing specialized training strategies, it achieves state-of-the-art validity and dynamic stability, and shows strong generalization via rediscovery of unseen nodes and topologies. The work also releases MofasaDB to enable large-scale screening and analysis, and argues for the broad applicability of all-atom diffusion as a foundation model for materials, enabling cross-domain transfer and rapid discovery without reliance on rigid modular building-block schemes. The results indicate that an all-atom approach can match or surpass domain-specific MOF generators, with potential impact across porous materials and crystalline systems. Limitations include computational scaling and conditional sampling challenges, pointing to future work on sparsity, broader material classes, and integration with property-guided search.

Abstract

Mofasa is an all-atom latent diffusion model with state-of-the-art performance for generating Metal-Organic Frameworks (MOFs). These are highly porous crystalline materials used to harvest water from desert air, capture carbon dioxide, store toxic gases and catalyse chemical reactions. In recognition of their value, the development of MOFs recently received a Nobel Prize in Chemistry. In many ways, MOFs are well-suited for exploiting generative models in chemistry: they are rationally-designable materials with a large combinatorial design space and strong structure-property couplings. And yet, to date, a high performance generative model has been lacking. To fill this gap, we introduce Mofasa, a general-purpose latent diffusion model that jointly samples positions, atom-types and lattice vectors for systems as large as 500 atoms. Mofasa avoids handcrafted assembly algorithms common in the literature, unlocking the simultaneous discovery of metal nodes, linkers and topologies. To help the scientific community build on our work, we release MofasaDB, an annotated library of hundreds of thousands of sampled MOF structures, along with a user-friendly web interface for search and discovery: https://mofux.ai/ .

Mofasa: A Step Change in Metal-Organic Framework Generation

TL;DR

Mofasa introduces an all-atom latent diffusion model capable of generating full MOF structures up to 500 atoms, overcoming prior scalability limits of all-atom methods. By jointly sampling atom types, positions, and lattice vectors within a hierarchical GNS backbone and employing specialized training strategies, it achieves state-of-the-art validity and dynamic stability, and shows strong generalization via rediscovery of unseen nodes and topologies. The work also releases MofasaDB to enable large-scale screening and analysis, and argues for the broad applicability of all-atom diffusion as a foundation model for materials, enabling cross-domain transfer and rapid discovery without reliance on rigid modular building-block schemes. The results indicate that an all-atom approach can match or surpass domain-specific MOF generators, with potential impact across porous materials and crystalline systems. Limitations include computational scaling and conditional sampling challenges, pointing to future work on sparsity, broader material classes, and integration with property-guided search.

Abstract

Mofasa is an all-atom latent diffusion model with state-of-the-art performance for generating Metal-Organic Frameworks (MOFs). These are highly porous crystalline materials used to harvest water from desert air, capture carbon dioxide, store toxic gases and catalyse chemical reactions. In recognition of their value, the development of MOFs recently received a Nobel Prize in Chemistry. In many ways, MOFs are well-suited for exploiting generative models in chemistry: they are rationally-designable materials with a large combinatorial design space and strong structure-property couplings. And yet, to date, a high performance generative model has been lacking. To fill this gap, we introduce Mofasa, a general-purpose latent diffusion model that jointly samples positions, atom-types and lattice vectors for systems as large as 500 atoms. Mofasa avoids handcrafted assembly algorithms common in the literature, unlocking the simultaneous discovery of metal nodes, linkers and topologies. To help the scientific community build on our work, we release MofasaDB, an annotated library of hundreds of thousands of sampled MOF structures, along with a user-friendly web interface for search and discovery: https://mofux.ai/ .

Paper Structure

This paper contains 57 sections, 11 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Validation of geometric structure with MOFChecker. Mofasa demonstrates a step-change improvement over the leading all-atom baseline, ADiT joshiAllatomDiffusionTransformers2025, increasing overall MOFChecker validity by $3.8{\times}$ from 15.7% to 59.9%.
  • Figure 2: Rediscovery and novelty analysis. Mofasa demonstrates strong generalization by rediscovering 437 nodes + 38 topologies absent from the training set. Beyond generating known chemistry, the model also generates MOFs with chemistries unseen in experimental databases.
  • Figure 3: Potential energy histograms on the Boyd-Woo (BW) boydComputationalDevelopmentNanoporous2017boydDatadrivenDesignMetal2019 dataset. Note that, Mofasa (black) and MOFFlow-2 (red) are trained on slightly different subsets of BW (see \ref{['sec: training data']}) and the Mofasa energies (black, blue) were computed with Orb-v3-con-inf-omatrhodesOrbv3AtomisticSimulation2025 vs UMA wood2025family for the MOFFlow-2 and MOFDiff. Nonetheless, the trend is clear: Mofasa is much better at matching the energy distribution of its training data.
  • Figure 4: Validity, Novelty, and Uniqueness (VNU) analysis. Percentage of QMOF samples for which MOFids exists (purple) and is unique (teal) and is novel (light green). Top row: all samples without validity constraints; the maximum score is $84\%$. Bottom row: only MOFChecker valid systems; the maximum score is $70\%$. See Appendix Table \ref{['tab:vnu']} for full numerical results.
  • Figure 5: Marginal distributions of simple properties of real data (QMOF, black) and generated samples (Mofasa, blue). There is a significant amount of distributional overlap for all properties, with no signs of severe mode collapse. Nonetheless, several areas remain for improvement: generated MOFs are on average, slightly too small, too cubic, far too likely to be triclinic, more likely to fail MOFid identification (resulting in 0 nodes/linkers) and insufficiently porous as assessed by Zeo++ with a $1.86$Å Nitrogen probe (note the legends of the final row, which state the percentage of systems for which zero volume/area is accessible).
  • ...and 4 more figures