AutoFAIR : Automatic Data FAIRification via Machine Reading
Tingyan Ma, Wei Liu, Bin Lu, Xiaoying Gan, Yunqiang Zhu, Luoyi Fu, Chenghu Zhou
TL;DR
This work presents AutoFAIR, an automated architecture to FAIRify data by linking data/metadata operations to FAIR indicators and employing a two-stage Web Reader (DOM-based GNN node classification and LM-driven extraction) together with FAIR Alignment (ontology guidance and semantic matching) to produce machine-readable, standards-aligned metadata. A case study in mountain hazards shows substantial improvements in Findability, Accessibility, Interoperability, and Reusability, including the generation of spatiotemporal maps and searchable metadata profiles. By evaluating 7124 datasets across 512 domains, AutoFAIR demonstrates cross-domain applicability and scalable automation for data sharing and reuse, while acknowledging dependence on the source webpages' information richness. The approach enhances data discovery and reuse in practice and provides a blueprint for broader automated FAIRification across scientific domains.
Abstract
The explosive growth of data fuels data-driven research, facilitating progress across diverse domains. The FAIR principles emerge as a guiding standard, aiming to enhance the findability, accessibility, interoperability, and reusability of data. However, current efforts primarily focus on manual data FAIRification, which can only handle targeted data and lack efficiency. To address this issue, we propose AutoFAIR, an architecture designed to enhance data FAIRness automately. Firstly, We align each data and metadata operation with specific FAIR indicators to guide machine-executable actions. Then, We utilize Web Reader to automatically extract metadata based on language models, even in the absence of structured data webpage schemas. Subsequently, FAIR Alignment is employed to make metadata comply with FAIR principles by ontology guidance and semantic matching. Finally, by applying AutoFAIR to various data, especially in the field of mountain hazards, we observe significant improvements in findability, accessibility, interoperability, and reusability of data. The FAIRness scores before and after applying AutoFAIR indicate enhanced data value.
