Advancing Scientific Knowledge Retrieval and Reuse with a Novel Digital Library for Machine-Readable Knowledge
Hadi Ghaemi, Lauren Snyder, Markus Stocker
TL;DR
The paper addresses the limitation that current digital libraries are document-centric and not readily machine-readable, hindering synthesis-based reuse. It introduces ORKG reborn, a three-layer digital library that publishes machine-readable scientific knowledge as reborn articles with statements and supporting evidence linked to data and code. The architecture combines a Data Type Registry and RO-Crates for data deposition, Elasticsearch and Faiss for storage and search, and a hybrid retrieval approach with dense vectors, keyword search, and cross-encoder re-ranking. This approach improves transparency, reproducibility, and reuse, enabling novel information retrieval for synthesis and cross-domain knowledge integration; future work expands knowledge types and supports synthesis use cases.
Abstract
Digital libraries for research, such as the ACM Digital Library or Semantic Scholar, do not enable the machine-supported, efficient reuse of scientific knowledge (e.g., in synthesis research). This is because these libraries are based on document-centric models with narrative text knowledge expressions that require manual or semi-automated knowledge extraction, structuring, and organization. We present ORKG reborn, an emerging digital library that supports finding, accessing, and reusing accurate, fine-grained, and reproducible machine-readable expressions of scientific knowledge that relate scientific statements and their supporting evidence in terms of data and code. The rich expressions of scientific knowledge are published as reborn (born-reusable) articles and provide novel possibilities for scientific knowledge retrieval, for instance by statistical methods, software packages, variables, or data matching specific constraints. We describe the proposed system and demonstrate its practical viability and potential for information retrieval in contrast to state-of-the-art digital libraries and document-centric scholarly communication using several published articles in research fields ranging from computer science to soil science. Our work underscores the enormous potential of scientific knowledge databases and a viable approach to their construction.
