Table of Contents
Fetching ...

Reshaping MOFs text mining with a dynamic multi-agents framework of large language model

Zuhong Lin, Daoyuan Ren, Kai Ran, Jing Sun, Songlin Yu, Xuefeng Bai, Xiaotian Huang, Haiyang He, Pengxu Pan, Ying Fang, Zhanglin Li, Haipu Li, Jingjing Yao

TL;DR

The paper tackles the challenge of extracting MOF synthesis parameters from unstructured literature. It proposes MOFh6, a dynamic multi-agent framework that combines LLMs with rule-based components to read articles and crystallographic data and produce standardized synthesis tables. Key results show extraction accuracy of $0.99$, abbreviation resolution of $0.941$, and precision around $0.93$, with rapid processing and low cost enabling scalable literature-to-protocol conversion. This approach enables data-driven MOF design and broader digitization of materials discovery.

Abstract

Accurately identifying the synthesis conditions of metal-organic frameworks (MOFs) is essential for guiding experimental design, yet remains challenging because relevant information in the literature is often scattered, inconsistent, and difficult to interpret. We present MOFh6, a large language model driven system that reads raw articles or crystal codes and converts them into standardized synthesis tables. It links related descriptions across paragraphs, unifies ligand abbreviations with full names, and outputs structured parameters ready for use. MOFh6 achieved 99% extraction accuracy, resolved 94.1% of abbreviation cases across five major publishers, and maintained a precision of 0.93 +/- 0.01. Processing a full text takes 9.6 s, locating synthesis descriptions 36 s, with 100 papers processed for USD 4.24. By replacing static database lookups with real-time extraction, MOFh6 reshapes MOF synthesis research, accelerating the conversion of literature knowledge into practical synthesis protocols and enabling scalable, data-driven materials discovery.

Reshaping MOFs text mining with a dynamic multi-agents framework of large language model

TL;DR

The paper tackles the challenge of extracting MOF synthesis parameters from unstructured literature. It proposes MOFh6, a dynamic multi-agent framework that combines LLMs with rule-based components to read articles and crystallographic data and produce standardized synthesis tables. Key results show extraction accuracy of , abbreviation resolution of , and precision around , with rapid processing and low cost enabling scalable literature-to-protocol conversion. This approach enables data-driven MOF design and broader digitization of materials discovery.

Abstract

Accurately identifying the synthesis conditions of metal-organic frameworks (MOFs) is essential for guiding experimental design, yet remains challenging because relevant information in the literature is often scattered, inconsistent, and difficult to interpret. We present MOFh6, a large language model driven system that reads raw articles or crystal codes and converts them into standardized synthesis tables. It links related descriptions across paragraphs, unifies ligand abbreviations with full names, and outputs structured parameters ready for use. MOFh6 achieved 99% extraction accuracy, resolved 94.1% of abbreviation cases across five major publishers, and maintained a precision of 0.93 +/- 0.01. Processing a full text takes 9.6 s, locating synthesis descriptions 36 s, with 100 papers processed for USD 4.24. By replacing static database lookups with real-time extraction, MOFh6 reshapes MOF synthesis research, accelerating the conversion of literature knowledge into practical synthesis protocols and enabling scalable, data-driven materials discovery.

Paper Structure

This paper contains 14 sections, 6 equations, 69 figures, 2 tables.

Figures (69)

  • Figure 1: Enterprise architecture of MOFh6.
  • Figure 2: Overview pipeline of MOFh6. Task I operates through collaborative LLM agents; Task II integrates LLMs with a rule engine for constrained synthesis query parsing; and Task III leverages the Qt framework and the Hugging Face ecosystem to support structure visualization and CIF services.
  • Figure 3: Query & Answer of MOFh6. User commands for initiating literature crawling (a), sequential agent-based mining of synthesis data (b), output of structured synthesis parameters (c), real-time app interface (d).
  • Figure 4: Agents performance evaluation: The performance of different sample fine-tuning models on their test set in the dataset (a), the ability of coreference resolution (b), the ability of the model to extract specific MOFs synthesis paragraphs under different pool sizes (c), and the ability of the model to extract structured data (d).
  • Figure S1: The crawler module of MOFh6
  • ...and 64 more figures