Table of Contents
Fetching ...

System of Agentic AI for the Discovery of Metal-Organic Frameworks

Theo Jaffrelot Inizan, Sherry Yang, Aaron Kaplan, Yen-hsu Lin, Jian Yin, Saber Mirzaei, Mona Abdelgaid, Ali H. Alawadhi, KwangHwan Cho, Zhiling Zheng, Ekin Dogus Cubuk, Christian Borgs, Jennifer T. Chayes, Kristin A. Persson, Omar M. Yaghi

TL;DR

MOFGen presents a compact, agentic-AI framework that integrates LLM-driven linker design, diffusion-based MOF crystal generation, and multi-tier quantum-mechanical screening to rapidly identify synthesizable MOFs. The system couples LinkerGen, CrystalGen, QForge, SynthABLE, QHarden, and SynthGen into a coherent pipeline validated by the experimental synthesis of five AI-generated MOFs, including de novo and reimagined linkers. By leveraging expansive training data and high-throughput synthesis, MOFGen expands the searchable MOF space with synthesizability constraints and demonstrates scalable generation of viable candidates. This work advances toward autonomous, AI-guided materials discovery with tangible laboratory realization, with implications for MOFs in CO2 capture, water harvesting, and related applications.

Abstract

Generative models and machine learning promise accelerated material discovery in MOFs for CO2 capture and water harvesting but face significant challenges navigating vast chemical spaces while ensuring synthetizability. Here, we present MOFGen, a system of Agentic AI comprising interconnected agents: a large language model that proposes novel MOF compositions, a diffusion model that generates crystal structures, quantum mechanical agents that optimize and filter candidates, and synthetic-feasibility agents guided by expert rules and machine learning. Trained on all experimentally reported MOFs and computational databases, MOFGen generated hundreds of thousands of novel MOF structures and synthesizable organic linkers. Our methodology was validated through high-throughput experiments and the successful synthesis of five "AI-dreamt" MOFs, representing a major step toward automated synthesizable material discovery.

System of Agentic AI for the Discovery of Metal-Organic Frameworks

TL;DR

MOFGen presents a compact, agentic-AI framework that integrates LLM-driven linker design, diffusion-based MOF crystal generation, and multi-tier quantum-mechanical screening to rapidly identify synthesizable MOFs. The system couples LinkerGen, CrystalGen, QForge, SynthABLE, QHarden, and SynthGen into a coherent pipeline validated by the experimental synthesis of five AI-generated MOFs, including de novo and reimagined linkers. By leveraging expansive training data and high-throughput synthesis, MOFGen expands the searchable MOF space with synthesizability constraints and demonstrates scalable generation of viable candidates. This work advances toward autonomous, AI-guided materials discovery with tangible laboratory realization, with implications for MOFs in CO2 capture, water harvesting, and related applications.

Abstract

Generative models and machine learning promise accelerated material discovery in MOFs for CO2 capture and water harvesting but face significant challenges navigating vast chemical spaces while ensuring synthetizability. Here, we present MOFGen, a system of Agentic AI comprising interconnected agents: a large language model that proposes novel MOF compositions, a diffusion model that generates crystal structures, quantum mechanical agents that optimize and filter candidates, and synthetic-feasibility agents guided by expert rules and machine learning. Trained on all experimentally reported MOFs and computational databases, MOFGen generated hundreds of thousands of novel MOF structures and synthesizable organic linkers. Our methodology was validated through high-throughput experiments and the successful synthesis of five "AI-dreamt" MOFs, representing a major step toward automated synthesizable material discovery.

Paper Structure

This paper contains 17 sections, 19 figures, 3 tables.

Figures (19)

  • Figure 1: Overview of MOFGen. 1) MOFMaster: Manages the overall system and serves as user-interface. 2) LinkerGen: An in-context learning LLM agent that proposes novel chemical compositions. 3) CrystalGen: A denoising diffusion probabilistic model conditioned on chemical compositions that generates crystal structures. 4) QForge: sequence of geometry optimization and Monte Carlo sampling to filter non-porous structures 5) SynthABLE: Decomposes the diffusion-generated MOFs into their building blocks and assesses synthesizability using multi-fidelity rules and ML-based predictors. 6) QHarden: sequence of medium-to-high level quantum mechanical geometry optimization and formation energy evaluation, from PBE-D4 to r$^2$SCAN-D4. 7) SynthGen: High-throughput synthesis platform to synthesis MOF, combined with LLM synthesis planning system, the crystal are then passed through SCXRD or PXRD for charaterization and added to the pool of experimental structures.
  • Figure 2: Analysis of MOF's organic linkers with SynthABLE. Analysis of the 80,968 organic linkers extracted from the diffusion-generated MOF crystal structures using MOFid bucior_identification_2019, after optimization and filtering through QForge. a, t-SNE projection maaten_visualizing_2008 of diffusion-generated MOF linkers. For each molecule, the mean and standard deviation of atomic descriptors from the MACE-OMAT-0 model were computed and concatenated, which served as input to the t-SNE algorithm. b, Synthesizability assessment with the Allchemy software. c, SCScore coley_scscore_2018, SA score ertl_estimation_2009 and BR-SAScore score distributions of the diffusion-generated organic linkers (Diffusion Model + LLM), compared to the curated experimental database and the initial LLM-generated linkers chemical formula used as input to the diffusion model.
  • Figure 3: Analysis of MOF crystal structures selected by QHarden.a, Predicted decomposition temperature distribution ($T_{d}$), b, stability upon solvent removal nandy_using_2021, c and MACE-MP-0b computed Bulk modulus distributions for the Experimental MOF structures (QMOF database rosen_machine_2021) (blue), the QMOF subset with chemical compositions matching diffusion-generated MOFs as well as pcu topology (pink), and the diffusion-generated MOFs, denoted Diffusion Model + LLM (orange). d, Formation energy computed at the r$^2$SCAN-D4 level for diffusion-generated MOFs compared to the Material Project data. e, Representative samples of the "AI-dreamt" MOFs as well as their corresponding topologies and computed properties (Td: decomposition temperature; POAV: probe-occupiable accessible volume).
  • Figure 4: Synthesis strategies and crystal structures of MOFs. Overview of the different strategies used to synthesize the MOFs, alongside the crystal structures, PXRD patterns with experimental (red) and simulated (black), and organic linkers. After filtering the crystal structures with QHarden, the organic linkers of top MOF candidates were extracted and synthesized. a, Crystal structure of AI-MOF-1, with corresponding organic linkers and experimental versus simulated PXRD patterns following the crossover mutation strategy. b, Crystal structure of AI-MOF-2, showing the organic linkers before and after modification, with experimental and simulated PXRD patterns obtained from the reimagination strategy. c, Crystal structures of AI-MOF-3, AI-MOF-4, and AI-MOF-5, PXRD patterns generated via the de novo design strategy.
  • Figure S1: Example of prompt used for the organic linker SMILES generation. A general prompt giving context of the MOF field is given to the system prompt. The user prompt include the experimental MOF organic linkers and the kekulized validated SMILES are then stored in a CSV file and convert to chemical formulae.
  • ...and 14 more figures