Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science
Genoveva Vargas-Solar, Umberto Costa, Jérôme Darmont, Javier Espinosa-Oviedo, Carmem Hara, Sabine Loudcher, Regina Motz, Martin A. Musicante, José-Luis Zechinelli-Martini
TL;DR
Experiversum addresses the challenge of reproducing data-driven experiments across diverse domains by integrating a lakehouse architecture with a metadata-driven metamodel that couples raw data, experimental actions, and contextual decisions. It enables end-to-end ELT pipelines, provenance-aware curation, and exploratory analytics within a unified environment, aiming to close the gap between exploration and reproducibility. The authors validate the approach through case studies in biodiversity, seismology, and political messaging analysis, highlighting improvements in traceability, interpretability and collaborative workflows. The work suggests that preserving full experimental context and structured metadata is crucial for reusable, cross-disciplinary science, and outlines future enhancements such as NLP-based tagging and privacy-aware analytics.
Abstract
This paper introduces Experiversum, a lakehouse-based ecosystem that supports the curation, documentation and reproducibility of exploratory experiments. Experiversum enables structured research through iterative data cycles, while capturing metadata and collaborative decisions. Demonstrated through case studies in Earth, Life and Political Sciences, Experiversum promotes transparent workflows and multi-perspective result interpretation. Experiversum bridges exploratory and reproducible research, encouraging accountable and robust data-driven practices across disciplines.
