Table of Contents
Fetching ...

A Unified Metamodel for NoSQL and Relational Databases

Carlos J. Fernández Candel, Diego Sevilla Ruiz, Jesús J. García-Molina

TL;DR

This paper addresses the fragmentation of data models across relational and NoSQL databases by introducing U-Schema, a unified metamodel capable of representing logical schemas for relational, columnar, document, key-value, and graph stores. It formalizes forward mappings from each data model to U-Schema and reverse mappings back, incorporating the novel notions of structural variability and explicit relationship types to capture schemaless realities. A common two-stage strategy based on MapReduce is proposed to extract unified schemas and validate them across Neo4j, MongoDB, Redis, Cassandra, and HBase, with MySQL included for relational validation. The approach enables cross-model database tooling, migrations, generic schema querying, dataset generation, and visualization, all anchored in the EMF/Ecore ecosystem for reuse and extensibility. Overall, U-Schema represents a significant step toward practical polyglot persistence tooling by providing a single, expressive logical framework for diverse data models and scalable schema extraction workflows.

Abstract

The Database field is undergoing significant changes. Although relational systems are still predominant, the interest in NoSQL systems is continuously increasing. In this scenario, polyglot persistence is envisioned as the database architecture to be prevalent in the future. Multi-model database tools normally use a generic or unified metamodel to represent schemas of the data model that they support. Such metamodels facilitate developing utilities, as they can be built on a common representation. Also, the number of mappings required to migrate databases from a data model to another is reduced, and integrability is favored. In this paper, we present the U-Schema unified metamodel able to represent logical schemas for the four most popular NoSQL paradigms (columnar, document, key-value, and graph) as well as relational schemas. We will formally define the mappings between U-Schema and the data model defined for each paradigm. How these mappings have been implemented and validated will be discussed, and some applications of U-Schema will be shown. To achieve flexibility to respond to data changes, most of NoSQL systems are "schema-on-write," and the declaration of schemas is not required. Such an absence of schema declaration makes structural variability possible, i.e., stored data of the same entity type can have different structure. Moreover, data relationships supported by each data model are different. We will show how all these issues have been tackled in our approach. Our metamodel goes beyond the existing proposals by distinguishing entity types and relationship types, representing aggregation and reference relationships, and including the notion of structural variability. Our contributions also include developing schema extraction strategies for schemaless systems of each NoSQL data model, and tackling performance and scalability in the implementation for each store.

A Unified Metamodel for NoSQL and Relational Databases

TL;DR

This paper addresses the fragmentation of data models across relational and NoSQL databases by introducing U-Schema, a unified metamodel capable of representing logical schemas for relational, columnar, document, key-value, and graph stores. It formalizes forward mappings from each data model to U-Schema and reverse mappings back, incorporating the novel notions of structural variability and explicit relationship types to capture schemaless realities. A common two-stage strategy based on MapReduce is proposed to extract unified schemas and validate them across Neo4j, MongoDB, Redis, Cassandra, and HBase, with MySQL included for relational validation. The approach enables cross-model database tooling, migrations, generic schema querying, dataset generation, and visualization, all anchored in the EMF/Ecore ecosystem for reuse and extensibility. Overall, U-Schema represents a significant step toward practical polyglot persistence tooling by providing a single, expressive logical framework for diverse data models and scalable schema extraction workflows.

Abstract

The Database field is undergoing significant changes. Although relational systems are still predominant, the interest in NoSQL systems is continuously increasing. In this scenario, polyglot persistence is envisioned as the database architecture to be prevalent in the future. Multi-model database tools normally use a generic or unified metamodel to represent schemas of the data model that they support. Such metamodels facilitate developing utilities, as they can be built on a common representation. Also, the number of mappings required to migrate databases from a data model to another is reduced, and integrability is favored. In this paper, we present the U-Schema unified metamodel able to represent logical schemas for the four most popular NoSQL paradigms (columnar, document, key-value, and graph) as well as relational schemas. We will formally define the mappings between U-Schema and the data model defined for each paradigm. How these mappings have been implemented and validated will be discussed, and some applications of U-Schema will be shown. To achieve flexibility to respond to data changes, most of NoSQL systems are "schema-on-write," and the declaration of schemas is not required. Such an absence of schema declaration makes structural variability possible, i.e., stored data of the same entity type can have different structure. Moreover, data relationships supported by each data model are different. We will show how all these issues have been tackled in our approach. Our metamodel goes beyond the existing proposals by distinguishing entity types and relationship types, representing aggregation and reference relationships, and including the notion of structural variability. Our contributions also include developing schema extraction strategies for schemaless systems of each NoSQL data model, and tackling performance and scalability in the implementation for each store.

Paper Structure

This paper contains 56 sections, 16 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: U-Schema Metamodel.
  • Figure 2: Generic Schema Extraction Strategy.
  • Figure 3: "User profile" running example schema.
  • Figure 4: Inference to Query time ratio.
  • Figure 5: Graph Data Model.
  • ...and 12 more figures