Table of Contents
Fetching ...

Automated Extraction of Mechanical Constitutive Models from Scientific Literature using Large Language Models: Applications in Cultural Heritage Conservation

Rui Hu, Yue Wu, Tianhao Su, Yin Wang, Shunbo Hu, Jizhong Huang

TL;DR

The paper tackles the problem of scattered literature on mechanical constitutive models for heritage materials, which hampers Digital Twin development. It introduces a two‑stage Gatekeeper–Analyst framework that uses LLMs to extract equations, calibrated parameters, and metadata from PDFs, aided by context‑aware symbolic grounding and schema‑constrained decoding. Applied to over 2,000 papers, 113 core documents yielded 185 constitutive model instances and more than 450 calibrated parameters with precision 80.4%, recall 83.3%, and F1 81.9%. The resulting Heritage Materials Constitutive Database Platform provides intelligent data ingestion and semantic retrieval, turning dispersed literature into a queryable digital asset to support numerical simulations and Digital Material Twin development for built heritage.

Abstract

The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and "Digital Twin" construction. However, the mechanical constitutive models required for high-fidelity simulations remain fragmented across decades of unstructured scientific literature, creating a "Data Silo" that hinders conservation engineering. To address this, we present an automated, two-stage agentic framework leveraging Large Language Models (LLMs) to extract mechanical constitutive equations, calibrated parameters, and metadata from PDF documents. The workflow employs a resource-efficient "Gatekeeper" agent for relevance filtering and a high-capability "Analyst" agent for fine-grained extraction, featuring a novel Context-Aware Symbolic Grounding mechanism to resolve mathematical ambiguities. Applied to a corpus of over 2,000 research papers, the system successfully isolated 113 core documents and constructed a structured database containing 185 constitutive model instances and over 450 calibrated parameters. The extraction precision reached 80.4\%, establishing a highly efficient "Human-in-the-loop" workflow that reduces manual data curation time by approximately 90\%. We demonstrate the system's utility through a web-based Knowledge Retrieval Platform, which enables rapid parameter discovery for computational modeling. This work transforms scattered literature into a queryable digital asset, laying the data foundation for the "Digital Material Twin" of built heritage.

Automated Extraction of Mechanical Constitutive Models from Scientific Literature using Large Language Models: Applications in Cultural Heritage Conservation

TL;DR

The paper tackles the problem of scattered literature on mechanical constitutive models for heritage materials, which hampers Digital Twin development. It introduces a two‑stage Gatekeeper–Analyst framework that uses LLMs to extract equations, calibrated parameters, and metadata from PDFs, aided by context‑aware symbolic grounding and schema‑constrained decoding. Applied to over 2,000 papers, 113 core documents yielded 185 constitutive model instances and more than 450 calibrated parameters with precision 80.4%, recall 83.3%, and F1 81.9%. The resulting Heritage Materials Constitutive Database Platform provides intelligent data ingestion and semantic retrieval, turning dispersed literature into a queryable digital asset to support numerical simulations and Digital Material Twin development for built heritage.

Abstract

The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and "Digital Twin" construction. However, the mechanical constitutive models required for high-fidelity simulations remain fragmented across decades of unstructured scientific literature, creating a "Data Silo" that hinders conservation engineering. To address this, we present an automated, two-stage agentic framework leveraging Large Language Models (LLMs) to extract mechanical constitutive equations, calibrated parameters, and metadata from PDF documents. The workflow employs a resource-efficient "Gatekeeper" agent for relevance filtering and a high-capability "Analyst" agent for fine-grained extraction, featuring a novel Context-Aware Symbolic Grounding mechanism to resolve mathematical ambiguities. Applied to a corpus of over 2,000 research papers, the system successfully isolated 113 core documents and constructed a structured database containing 185 constitutive model instances and over 450 calibrated parameters. The extraction precision reached 80.4\%, establishing a highly efficient "Human-in-the-loop" workflow that reduces manual data curation time by approximately 90\%. We demonstrate the system's utility through a web-based Knowledge Retrieval Platform, which enables rapid parameter discovery for computational modeling. This work transforms scattered literature into a queryable digital asset, laying the data foundation for the "Digital Material Twin" of built heritage.
Paper Structure (21 sections, 2 equations, 6 figures)

This paper contains 21 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: Overview of the Two-Stage Agentic Framework. The workflow adopts a coarse-to-fine strategy: (1) Raw PDF ingestion and serialization; (2) The Gatekeeper acts as a low-cost filter to discard irrelevant literature; (3) The Analyst employs deep reasoning for symbolic grounding; (4) The final parameters are stored in a structured JSON database.
  • Figure 2: Distribution of Constitutive Mechanisms. The framework successfully categorized extracted models into distinct rheological behaviors. The prevalence of plasticity and damage models aligns with the need for safety assessment in heritage structures.
  • Figure 3: Quantitative Evaluation of the Framework. (a) Confusion Matrix showing the extraction performance (TP=185, TN=1311); (b) ROC Curve with an AUC of 0.782. The selected operating point (Red Dot) corresponds to a low False Positive Rate (3.3%), prioritizing the reliability of database entries.
  • Figure 4: Qualitative Extraction Case Study. The extraction process for a Jeffreys-type Viscoelastic model. (A) The raw PDF presents the constitutive law components scattered across disjointed sections and data in a complex table. (B) The Agent fuses these components into a structured JSON while filtering derivation noise. (C) The final Verified Knowledge Entry shows correct Symbolic Grounding (mapping $\xi$ to structural kinetics) and Physical Plausibility Verification (resolving the ambiguous header scale via domain constraints).
  • Figure 5: Automated Data Ingestion Interface. The detail view shows the extraction results from an uploaded PDF. The system automatically identifies the constitutive model, renders the equation, and populates the fitted parameters for user verification.
  • ...and 1 more figures