Table of Contents
Fetching ...

Rethinking the production and publication of machine-reusable expressions of research findings

Markus Stocker, Lauren Snyder, Matthew Anfuso, Oliver Ludwig, Freya Thießen, Kheir Eddine Farfar, Muhammad Haris, Allard Oelen, Mohamad Yaser Jaradeh

TL;DR

This paper introduces reborn, a pre-publication approach that makes machine-reusable scientific knowledge intrinsic to the research lifecycle by extending data analysis with structured data-type schemata and publishing these expressions as interlinked supplementary data via the Open Research Knowledge Graph (ORKG). Through three use cases across soil science, computer science, and agroecology, it demonstrates higher knowledge richness and accuracy compared to traditional post-publication extraction, while outlining roles for publishers, template registries, and data interoperability. The work argues for technical feasibility and scalability through community-driven templates and FAIR data practice integration, while acknowledging limitations in qualitative knowledge transfer and retroactive adoption. It also outlines future directions, including decoupling data types from ORKG, improving terminological annotation, and developing tools to assist manuscript writing from machine-reusable knowledge.

Abstract

Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born reusable, i.e. produced in a machine-reusable format during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. We test the approach with three use cases, and discuss the role of publishers and editors in scaling the approach. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge richness and accuracy as well as technological simplicity.

Rethinking the production and publication of machine-reusable expressions of research findings

TL;DR

This paper introduces reborn, a pre-publication approach that makes machine-reusable scientific knowledge intrinsic to the research lifecycle by extending data analysis with structured data-type schemata and publishing these expressions as interlinked supplementary data via the Open Research Knowledge Graph (ORKG). Through three use cases across soil science, computer science, and agroecology, it demonstrates higher knowledge richness and accuracy compared to traditional post-publication extraction, while outlining roles for publishers, template registries, and data interoperability. The work argues for technical feasibility and scalability through community-driven templates and FAIR data practice integration, while acknowledging limitations in qualitative knowledge transfer and retroactive adoption. It also outlines future directions, including decoupling data types from ORKG, improving terminological annotation, and developing tools to assist manuscript writing from machine-reusable knowledge.

Abstract

Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born reusable, i.e. produced in a machine-reusable format during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. We test the approach with three use cases, and discuss the role of publishers and editors in scaling the approach. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge richness and accuracy as well as technological simplicity.
Paper Structure (22 sections, 6 figures, 1 table)

This paper contains 22 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Scientific knowledge expressed in articles is produced as machine-reusable data in computing environments during the data analysis phase of the research lifecycle. Machine-reusable scientific knowledge is deposited in a data repository as supplementary data of the article and interlinked with the article in DOI metadata. Finally, to support reuse, e.g. for synthesis research, machine-reusable scientific knowledge is collected and organized in aggregation systems, such as knowledge graphs.
  • Figure 2: Display of the research finding published by Gentsch et al. in their Figure 1 as a research contribution in ORKG. The overlay expands on the interlinked R script snippet used to implement the respective data analysis. For an interactive experience, we refer readers to the version published online at https://doi.org/10.48366/R664252.
  • Figure 3: Display of a Leaderboard showing the performance Scores (Metric F1 score @ Layer 1) of the three models evaluated using the SciERC Dataset for the machine learning Task of "Synonym Discovery" as published by Thießen et al.
  • Figure 4: Display of the research finding published by Perez-Alvarez et al. in their Figure 4 (a) as a research contribution in ORKG. The two overlays illustrate detailed information in the form of visualizations and tabular data. For an interactive experience, we refer readers to the version published online at https://doi.org/10.48366/R689181.
  • Figure 5: ORKG Word Add-in display of the supplementary data in our use case in computer science for model performance for each dataset in the evaluation. Users provide the JSON-LD supplementary data produced in model performance evaluation and the Add-in automatically renders such TDMS-data as tables, one for each evaluated model.
  • ...and 1 more figures