Table of Contents
Fetching ...

Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab

Leon Gorißen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher, Constantin Häfner

TL;DR

This paper addresses data integration challenges in increasingly interconnected production environments by proposing Data-to-Knowledge (d2k) and Knowledge-to-Data (k2d) pipelines built on Digital Shadows to connect production across organizations and lifecycle stages. It introduces a WWL-inspired framework and demonstrates a proof-of-concept where semantically annotated trajectory data from multiple robots across organizations train a cross-domain inverse-dynamics foundation model, leveraging a shared research data repository and data lakehouse concepts. The work shows that fine-tuning foundation and instance models on aggregated multi-institution data yields faster convergence and competitive accuracy, highlighting gains in data reuse, scalability, and cross-organizational collaboration while addressing provenance and FAIR data principles. Collectively, the approach advances intelligent, adaptive production systems for Industry 4.0 by enabling controlled data sharing, robust knowledge transfer, and governance-aware decision-making across the World Wide Lab ecosystem.

Abstract

The digital transformation of production requires new methods of data integration and storage, as well as decision making and support systems that work vertically and horizontally throughout the development, production, and use cycle. In this paper, we propose Data-to-Knowledge (and Knowledge-to-Data) pipelines for production as a universal concept building on a network of Digital Shadows (a concept augmenting Digital Twins). We show a proof of concept that builds on and bridges existing infrastructure to 1) capture and semantically annotates trajectory data from multiple similar but independent robots in different organisations and use cases in a data lakehouse and 2) an independent process that dynamically queries matching data for training an inverse dynamic foundation model for robotic control. The article discusses the challenges and benefits of this approach and how Data-to-Knowledge pipelines contribute efficiency gains and industrial scalability in a World Wide Lab as a research outlook.

Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab

TL;DR

This paper addresses data integration challenges in increasingly interconnected production environments by proposing Data-to-Knowledge (d2k) and Knowledge-to-Data (k2d) pipelines built on Digital Shadows to connect production across organizations and lifecycle stages. It introduces a WWL-inspired framework and demonstrates a proof-of-concept where semantically annotated trajectory data from multiple robots across organizations train a cross-domain inverse-dynamics foundation model, leveraging a shared research data repository and data lakehouse concepts. The work shows that fine-tuning foundation and instance models on aggregated multi-institution data yields faster convergence and competitive accuracy, highlighting gains in data reuse, scalability, and cross-organizational collaboration while addressing provenance and FAIR data principles. Collectively, the approach advances intelligent, adaptive production systems for Industry 4.0 by enabling controlled data sharing, robust knowledge transfer, and governance-aware decision-making across the World Wide Lab ecosystem.

Abstract

The digital transformation of production requires new methods of data integration and storage, as well as decision making and support systems that work vertically and horizontally throughout the development, production, and use cycle. In this paper, we propose Data-to-Knowledge (and Knowledge-to-Data) pipelines for production as a universal concept building on a network of Digital Shadows (a concept augmenting Digital Twins). We show a proof of concept that builds on and bridges existing infrastructure to 1) capture and semantically annotates trajectory data from multiple similar but independent robots in different organisations and use cases in a data lakehouse and 2) an independent process that dynamically queries matching data for training an inverse dynamic foundation model for robotic control. The article discusses the challenges and benefits of this approach and how Data-to-Knowledge pipelines contribute efficiency gains and industrial scalability in a World Wide Lab as a research outlook.

Paper Structure

This paper contains 17 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of the steps within the proposed pipelines. Each pipeline itself is a directed acyclic graph, but Data-to-Knowledge Pipeline can be a start to create a Knowledge-to-Data Pipeline and vice versa.
  • Figure 2: In this network of d2k pipelines trajectory data from multiple sources of different organizations are stored in a research data management repository and semantically annotated (lower use cases). The training instance (upper use case) queries available training data and provides a foundation model. Instance specific models are derived from the foundation model by the original use cases or third party use cases.
  • Figure 3: Franka Emika Robots integrated into a d2k pipeline to train a foundation model of robot dynamics.
  • Figure 4: Joint position histogram across datasets for each axis. Bars are colourless for readability. The plot for axis two of the robot highlights the effect of workspace restrictions for WZL and ITA on dataset distribution.
  • Figure 5: Boxplot of runtime distributions for different training setups (log scale). Fine-tuning approaches are faster than end-to-end training in every metric.
  • ...and 1 more figures