Table of Contents
Fetching ...

S4CMDR: a metadata repository for electronic health records

Jiawei Zhao, Md Shamim Ahmed, Nicolai Dinh Khang Truong, Verena Schuster, Rudolf Mayer, Richard Röttger

Abstract

Background: Electronic health records (EHRs) enable machine learning for diagnosis, prognosis, and clinical decision support. However, EHR standards vary by country and hospital, making records often incompatible. This limits large-scale and cross-clinical machine learning. To address such complexity, a metadata repository cataloguing available data elements, their value domains, and their compatibility is an essential tool. This allows researchers to leverage relevant data for tasks such as identifying undiagnosed rare disease patients. Results: Within the Screen4Care project, we developed S4CMDR, an open-source metadata repository built on ISO 11179-3, based on a middle-out metadata standardisation approach. It automates cataloguing to reduce errors and enable the discovery of compatible feature sets across data registries. S4CMDR supports on-premise Linux deployment and cloud hosting, with state-of-the-art user authentication and an accessible interface. Conclusions: S4CMDR is a clinical metadata repository registering and discovering compatible EHR records. Novel contributions include a microservice architecture, a middle-out standardisation approach, and a user-friendly interface for error-free data registration and visualisation of metadata compatibility. We validate S4CMDR's case studies involving rare disease patients. We invite clinical data holders to populate S4CMDR using their metadata to validate the generalisability and support further development.

S4CMDR: a metadata repository for electronic health records

Abstract

Background: Electronic health records (EHRs) enable machine learning for diagnosis, prognosis, and clinical decision support. However, EHR standards vary by country and hospital, making records often incompatible. This limits large-scale and cross-clinical machine learning. To address such complexity, a metadata repository cataloguing available data elements, their value domains, and their compatibility is an essential tool. This allows researchers to leverage relevant data for tasks such as identifying undiagnosed rare disease patients. Results: Within the Screen4Care project, we developed S4CMDR, an open-source metadata repository built on ISO 11179-3, based on a middle-out metadata standardisation approach. It automates cataloguing to reduce errors and enable the discovery of compatible feature sets across data registries. S4CMDR supports on-premise Linux deployment and cloud hosting, with state-of-the-art user authentication and an accessible interface. Conclusions: S4CMDR is a clinical metadata repository registering and discovering compatible EHR records. Novel contributions include a microservice architecture, a middle-out standardisation approach, and a user-friendly interface for error-free data registration and visualisation of metadata compatibility. We validate S4CMDR's case studies involving rare disease patients. We invite clinical data holders to populate S4CMDR using their metadata to validate the generalisability and support further development.
Paper Structure (28 sections, 7 figures, 1 table)

This paper contains 28 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Six crucial metadata classes are included in this figure. The wording of the figure is adopted for this manuscript.
  • Figure 2: This figure illustrates a partial directed acyclic graph (DAG) of three NCI Thesaurus terms that serve as Data_Element_Concepts and two NCI Thesaurus terms that serve as Conceptual_Domains. The Data_Element_Concept with more than one parent is shown in a blue rectangular block, the Data_Element_Concepts each with only one parent are shown in light blue rectangular blocks, and Conceptual_Domains are shown in light yellow rectangular blocks.
  • Figure 3: This figure illustrates a partial directed acyclic graph (DAG) of two HPO or SNOMED CT terms that serve as Permissible_Values and two HPO or SNOMED CT terms that serve as Value_Domains. The Permissible_Value with more than one parent is shown in a blue rectangular block, the Permissible_Value with only one parent is shown in a light blue rectangular block, and Value_Domains are shown in light yellow rectangular blocks.
  • Figure 4: This figure illustrates a partial directed acyclic graph (DAG) of two NCI Thesaurus terms that serve as Conceptual_Domains, one LOINC answer list that serves as a Value_Domain, and five valid LOINC answers that serve as Permissible_Values. The Conceptual_Domains are shown in light green rectangular blocks, the Value_Domain is shown in a light yellow rectangular block, and Permissible_Values are shown in light blue rectangular blocks.
  • Figure 5: Compared to \ref{['fig.iso11179_model']}, three changes to the MDR core metadata model are made. Firstly, the mapping relation between Conceptual_Domain and Data_Element_Concept is altered from one-to-many to many-to-many. Secondly, the mapping relation between Conceptual_Domain and Value_Domain is altered from one-to-many to many-to-many. Finally, the mapping relation between Value_Domain and Permissible_Value is altered from one-to-many to many-to-many.
  • ...and 2 more figures