CollectionLocator Level 1: Metadata-Based Search for Collections in Federated Biobanks
Volodymyr A. Shekhovtsov, Bence Slajcho, Aron Sacherer, Johann Eder
TL;DR
Biobanks are heterogeneous and often hinder researchers trying to locate data that meets study requirements due to privacy and interoperability challenges. The paper presents CollectionLocator Level 1, a metadata-based, ontology-driven federated search prototype that stores both content metadata and quality metadata about biobank collections, enabling privacy-preserving queries without exposing raw data. It leverages OMOP CDM concepts for semantic annotation and provides concept-based and quality-based search capabilities, with validation on a BBMRI-ERIC colorectal cohort demonstrating effective retrieval of matching collections while respecting hierarchies. The work significantly advances findability, interoperability, and privacy in biobank data discovery and outlines a roadmap toward indexing data-item contents and integrating into FAIRification workflows.
Abstract
Biobanks are indispensable resources for medical research collecting biological material and associated data and making them available for research projects and medical studies. For that, the biobank data has to meet certain criteria which can be formulated as adherence to the FAIR (findable, accessible, interoperable and reusable) principles. We developed a tool, CollectionLocator, which aims at increasing the FAIR compliance of biobank data by supporting researchers in identifying which biobank and which collection are likely to contain cases (material and data) satisfying the requirements of a defined research project when the detailed sample data is not available due to privacy restrictions. The CollectionLocator is based on an ontology-based metadata model to address the enormous heterogeneities and ensure the privacy of the donors of the biological samples and the data. Furthermore, the CollectionLocator represents the data and metadata quality of the collections such that the quality requirements of the requester can be matched with the quality of the available data. The concept of CollectionLocator is evaluated with a proof-of-concept implementation.
