Space of Data through the Lens of Multilevel Graph
Marco Caputo, Michele Russo, Emanuela Merelli
TL;DR
The paper tackles the complexity of dataspaces by introducing a multilevel graph to represent datasets across multiple abstraction layers and to enable traceable contraction/expansion. It provides formal definitions (decontractible graphs, contraction, natural transformation) and the MGDA pipeline that maps raw data through cleaning, normalization, and recursive feature detection into higher-level relational structures suitable for standard graph analytics. Preliminary validation on unstructured dream narratives demonstrates that contraction reduces noise while preserving traceability, yielding informative topological metrics such as assortativity, contraction percentage, and density. The work suggests MGDA as a promising framework for modeling dataspaces and enabling incremental, pay-as-you-go querying, with future directions including application to structured data and richer metadata integration.
Abstract
This work seeks to tackle the inherent complexity of dataspaces by introducing a novel data structure that can represent datasets across multiple levels of abstraction, ranging from local to global. We propose the concept of a multilevel graph, which is equipped with two fundamental operations: contraction and expansion of its topology. This multilevel graph is specifically designed to fulfil the requirements for incremental abstraction and flexibility, as outlined in existing definitions of dataspaces. Furthermore, we provide a comprehensive suite of methods for manipulating this graph structure, establishing a robust framework for data analysis. While its effectiveness has been empirically validated for unstructured data, its application to structured data is also inherently viable. Preliminary results are presented through a real-world scenario based on a collection of dream reports.
