Rethinking the production and publication of machine-reusable expressions of research findings
Markus Stocker, Lauren Snyder, Matthew Anfuso, Oliver Ludwig, Freya Thießen, Kheir Eddine Farfar, Muhammad Haris, Allard Oelen, Mohamad Yaser Jaradeh
TL;DR
This paper introduces reborn, a pre-publication approach that makes machine-reusable scientific knowledge intrinsic to the research lifecycle by extending data analysis with structured data-type schemata and publishing these expressions as interlinked supplementary data via the Open Research Knowledge Graph (ORKG). Through three use cases across soil science, computer science, and agroecology, it demonstrates higher knowledge richness and accuracy compared to traditional post-publication extraction, while outlining roles for publishers, template registries, and data interoperability. The work argues for technical feasibility and scalability through community-driven templates and FAIR data practice integration, while acknowledging limitations in qualitative knowledge transfer and retroactive adoption. It also outlines future directions, including decoupling data types from ORKG, improving terminological annotation, and developing tools to assist manuscript writing from machine-reusable knowledge.
Abstract
Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born reusable, i.e. produced in a machine-reusable format during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. We test the approach with three use cases, and discuss the role of publishers and editors in scaling the approach. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge richness and accuracy as well as technological simplicity.
