Automated Extraction of Mechanical Constitutive Models from Scientific Literature using Large Language Models: Applications in Cultural Heritage Conservation
Rui Hu, Yue Wu, Tianhao Su, Yin Wang, Shunbo Hu, Jizhong Huang
TL;DR
The paper tackles the problem of scattered literature on mechanical constitutive models for heritage materials, which hampers Digital Twin development. It introduces a two‑stage Gatekeeper–Analyst framework that uses LLMs to extract equations, calibrated parameters, and metadata from PDFs, aided by context‑aware symbolic grounding and schema‑constrained decoding. Applied to over 2,000 papers, 113 core documents yielded 185 constitutive model instances and more than 450 calibrated parameters with precision 80.4%, recall 83.3%, and F1 81.9%. The resulting Heritage Materials Constitutive Database Platform provides intelligent data ingestion and semantic retrieval, turning dispersed literature into a queryable digital asset to support numerical simulations and Digital Material Twin development for built heritage.
Abstract
The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and "Digital Twin" construction. However, the mechanical constitutive models required for high-fidelity simulations remain fragmented across decades of unstructured scientific literature, creating a "Data Silo" that hinders conservation engineering. To address this, we present an automated, two-stage agentic framework leveraging Large Language Models (LLMs) to extract mechanical constitutive equations, calibrated parameters, and metadata from PDF documents. The workflow employs a resource-efficient "Gatekeeper" agent for relevance filtering and a high-capability "Analyst" agent for fine-grained extraction, featuring a novel Context-Aware Symbolic Grounding mechanism to resolve mathematical ambiguities. Applied to a corpus of over 2,000 research papers, the system successfully isolated 113 core documents and constructed a structured database containing 185 constitutive model instances and over 450 calibrated parameters. The extraction precision reached 80.4\%, establishing a highly efficient "Human-in-the-loop" workflow that reduces manual data curation time by approximately 90\%. We demonstrate the system's utility through a web-based Knowledge Retrieval Platform, which enables rapid parameter discovery for computational modeling. This work transforms scattered literature into a queryable digital asset, laying the data foundation for the "Digital Material Twin" of built heritage.
