Table of Contents
Fetching ...

AI4DiTraRe: Building the BFO-Compliant Chemotion Knowledge Graph

Ebrahim Norouzi, Nicole Jung, Anna M. Jacyszyn, Jörg Waitelonis, Harald Sack

TL;DR

This work presents a pipeline to transform Chemotion metadata into a BFO-compliant Chemotion Knowledge Graph (Chemotion-KG) by harvesting JSON-LD, converting to RDF, and semantically enriching data with SPARQL CONSTRUCT using NFDICore and ChEBI within Ontology Design Patterns. The approach preserves provenance via named graphs and supports AI-driven reasoning and interoperability, with daily ingestion and a public SPARQL endpoint. As of July 2025, the KG comprises over 1.46 million triples and tens of thousands of instantiated entities, demonstrating scalable semantification of chemical research data. Future work targets broader data inclusion, cross-resource linking (e.g., PubChem, ChemSpider, NFDI4Chem), SHACL validation, competency questions, and integration with AI methods, including LLM-assisted curation and symbolic-statistical AI bridging.

Abstract

Chemistry is an example of a discipline where the advancements of technology have led to multi-level and often tangled and tricky processes ongoing in the lab. The repeatedly complex workflows are combined with information from chemical structures, which are essential to understand the scientific process. An important tool for many chemists is Chemotion, which consists of an electronic lab notebook and a repository. This paper introduces a semantic pipeline for constructing the BFO-compliant Chemotion Knowledge Graph, providing an integrated, ontology-driven representation of chemical research data. The Chemotion-KG has been developed to adhere to the FAIR (Findable, Accessible, Interoperable, Reusable) principles and to support AI-driven discovery and reasoning in chemistry. Experimental metadata were harvested from the Chemotion API in JSON-LD format, converted into RDF, and subsequently transformed into a Basic Formal Ontology-aligned graph through SPARQL CONSTRUCT queries. The source code and datasets are publicly available via GitHub. The Chemotion Knowledge Graph is hosted by FIZ Karlsruhe Information Service Engineering. Outcomes presented in this work were achieved within the Leibniz Science Campus ``Digital Transformation of Research'' (DiTraRe) and are part of an ongoing interdisciplinary collaboration.

AI4DiTraRe: Building the BFO-Compliant Chemotion Knowledge Graph

TL;DR

This work presents a pipeline to transform Chemotion metadata into a BFO-compliant Chemotion Knowledge Graph (Chemotion-KG) by harvesting JSON-LD, converting to RDF, and semantically enriching data with SPARQL CONSTRUCT using NFDICore and ChEBI within Ontology Design Patterns. The approach preserves provenance via named graphs and supports AI-driven reasoning and interoperability, with daily ingestion and a public SPARQL endpoint. As of July 2025, the KG comprises over 1.46 million triples and tens of thousands of instantiated entities, demonstrating scalable semantification of chemical research data. Future work targets broader data inclusion, cross-resource linking (e.g., PubChem, ChemSpider, NFDI4Chem), SHACL validation, competency questions, and integration with AI methods, including LLM-assisted curation and symbolic-statistical AI bridging.

Abstract

Chemistry is an example of a discipline where the advancements of technology have led to multi-level and often tangled and tricky processes ongoing in the lab. The repeatedly complex workflows are combined with information from chemical structures, which are essential to understand the scientific process. An important tool for many chemists is Chemotion, which consists of an electronic lab notebook and a repository. This paper introduces a semantic pipeline for constructing the BFO-compliant Chemotion Knowledge Graph, providing an integrated, ontology-driven representation of chemical research data. The Chemotion-KG has been developed to adhere to the FAIR (Findable, Accessible, Interoperable, Reusable) principles and to support AI-driven discovery and reasoning in chemistry. Experimental metadata were harvested from the Chemotion API in JSON-LD format, converted into RDF, and subsequently transformed into a Basic Formal Ontology-aligned graph through SPARQL CONSTRUCT queries. The source code and datasets are publicly available via GitHub. The Chemotion Knowledge Graph is hosted by FIZ Karlsruhe Information Service Engineering. Outcomes presented in this work were achieved within the Leibniz Science Campus ``Digital Transformation of Research'' (DiTraRe) and are part of an ongoing interdisciplinary collaboration.

Paper Structure

This paper contains 5 sections, 6 figures.

Figures (6)

  • Figure 1: Schematic workflow of the Chemotion Knowledge Graph construction.
  • Figure 2: Dataset representation in the Chemotion-KG.
  • Figure 3: Process–Agent–Role Ontology Design Pattern.
  • Figure 4: Creator description aligned to NFDICore and BFO patterns, enabling explicit representation of roles and affiliations.
  • Figure 5: Study representation with explicit modeling of publishing processes, temporal regions, and standard profiles.
  • ...and 1 more figures