Table of Contents
Fetching ...

Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science

Genoveva Vargas-Solar, Umberto Costa, Jérôme Darmont, Javier Espinosa-Oviedo, Carmem Hara, Sabine Loudcher, Regina Motz, Martin A. Musicante, José-Luis Zechinelli-Martini

TL;DR

Experiversum addresses the challenge of reproducing data-driven experiments across diverse domains by integrating a lakehouse architecture with a metadata-driven metamodel that couples raw data, experimental actions, and contextual decisions. It enables end-to-end ELT pipelines, provenance-aware curation, and exploratory analytics within a unified environment, aiming to close the gap between exploration and reproducibility. The authors validate the approach through case studies in biodiversity, seismology, and political messaging analysis, highlighting improvements in traceability, interpretability and collaborative workflows. The work suggests that preserving full experimental context and structured metadata is crucial for reusable, cross-disciplinary science, and outlines future enhancements such as NLP-based tagging and privacy-aware analytics.

Abstract

This paper introduces Experiversum, a lakehouse-based ecosystem that supports the curation, documentation and reproducibility of exploratory experiments. Experiversum enables structured research through iterative data cycles, while capturing metadata and collaborative decisions. Demonstrated through case studies in Earth, Life and Political Sciences, Experiversum promotes transparent workflows and multi-perspective result interpretation. Experiversum bridges exploratory and reproducible research, encouraging accountable and robust data-driven practices across disciplines.

Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science

TL;DR

Experiversum addresses the challenge of reproducing data-driven experiments across diverse domains by integrating a lakehouse architecture with a metadata-driven metamodel that couples raw data, experimental actions, and contextual decisions. It enables end-to-end ELT pipelines, provenance-aware curation, and exploratory analytics within a unified environment, aiming to close the gap between exploration and reproducibility. The authors validate the approach through case studies in biodiversity, seismology, and political messaging analysis, highlighting improvements in traceability, interpretability and collaborative workflows. The work suggests that preserving full experimental context and structured metadata is crucial for reusable, cross-disciplinary science, and outlines future enhancements such as NLP-based tagging and privacy-aware analytics.

Abstract

This paper introduces Experiversum, a lakehouse-based ecosystem that supports the curation, documentation and reproducibility of exploratory experiments. Experiversum enables structured research through iterative data cycles, while capturing metadata and collaborative decisions. Demonstrated through case studies in Earth, Life and Political Sciences, Experiversum promotes transparent workflows and multi-perspective result interpretation. Experiversum bridges exploratory and reproducible research, encouraging accountable and robust data-driven practices across disciplines.

Paper Structure

This paper contains 8 sections, 3 figures.

Figures (3)

  • Figure 1: Experiversum Architecture
  • Figure 2: Query visualisation in Experiversum
  • Figure 3: Data Metamodel UML Class Diagram