Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab
Leon Gorißen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher, Constantin Häfner
TL;DR
This paper addresses data integration challenges in increasingly interconnected production environments by proposing Data-to-Knowledge (d2k) and Knowledge-to-Data (k2d) pipelines built on Digital Shadows to connect production across organizations and lifecycle stages. It introduces a WWL-inspired framework and demonstrates a proof-of-concept where semantically annotated trajectory data from multiple robots across organizations train a cross-domain inverse-dynamics foundation model, leveraging a shared research data repository and data lakehouse concepts. The work shows that fine-tuning foundation and instance models on aggregated multi-institution data yields faster convergence and competitive accuracy, highlighting gains in data reuse, scalability, and cross-organizational collaboration while addressing provenance and FAIR data principles. Collectively, the approach advances intelligent, adaptive production systems for Industry 4.0 by enabling controlled data sharing, robust knowledge transfer, and governance-aware decision-making across the World Wide Lab ecosystem.
Abstract
The digital transformation of production requires new methods of data integration and storage, as well as decision making and support systems that work vertically and horizontally throughout the development, production, and use cycle. In this paper, we propose Data-to-Knowledge (and Knowledge-to-Data) pipelines for production as a universal concept building on a network of Digital Shadows (a concept augmenting Digital Twins). We show a proof of concept that builds on and bridges existing infrastructure to 1) capture and semantically annotates trajectory data from multiple similar but independent robots in different organisations and use cases in a data lakehouse and 2) an independent process that dynamically queries matching data for training an inverse dynamic foundation model for robotic control. The article discusses the challenges and benefits of this approach and how Data-to-Knowledge pipelines contribute efficiency gains and industrial scalability in a World Wide Lab as a research outlook.
