Table of Contents
Fetching ...

Graph-centric Cross-model Data Integration and Analytics in a Unified Multi-model Database

Zepeng Liu, Sheng Wang, Shixun Huang, Hailang Qiu, Yuwei Peng, Jiale Feng, Shunan Liao, Yushuai Ji, Zhiyong Peng

TL;DR

GredoDB is proposed, a unified MMDB that natively supports storing graph, relational, and document models, while efficiently processing GCDIA, and designs a unified GCDI optimization framework to exploit cross-model correlations.

Abstract

Graph-centric cross-model data integration and analytics (GCDIA) refer to tasks that leverage the graph model as a central paradigm to integrate relevant information across heterogeneous data models, such as relational and document, and subsequently perform complex analytics such as regression and similarity computation. As modern applications generate increasingly diverse data and move beyond simple retrieval toward advanced analytical objectives (e.g., prediction and recommendation), GCDIA has become increasingly important. Existing multi-model databases (MMDBs) struggle to efficiently support both integration (GCDI) and analytics (GCDA) in GCDIA. They typically separate graph processing from other models without global optimization for GCDI, while relying on tuple-at-a-time execution for GCDA, leading to limited performance and scalability. To address these limitations, we propose GredoDB, a unified MMDB that natively supports storing graph, relational, and document models, while efficiently processing GCDIA. Specifically, we design 1) topology- and attribute-aware graph operators for efficient predicate-aware traversal, 2) a unified GCDI optimization framework to exploit cross-model correlations, and 3) a parallel GCDA architecture that materializes intermediate results for operator-level execution. Experiments on the widely adopted multi-model benchmark M2Bench demonstrate that, in terms of response time, GredoDB achieves up to 107.89 times and an average of 10.89 times speedup on GCDI, and up to 356.72 times and an average of 37.79 times on GCDA, compared to state-of-the-art (SOTA) MMDBs.

Graph-centric Cross-model Data Integration and Analytics in a Unified Multi-model Database

TL;DR

GredoDB is proposed, a unified MMDB that natively supports storing graph, relational, and document models, while efficiently processing GCDIA, and designs a unified GCDI optimization framework to exploit cross-model correlations.

Abstract

Graph-centric cross-model data integration and analytics (GCDIA) refer to tasks that leverage the graph model as a central paradigm to integrate relevant information across heterogeneous data models, such as relational and document, and subsequently perform complex analytics such as regression and similarity computation. As modern applications generate increasingly diverse data and move beyond simple retrieval toward advanced analytical objectives (e.g., prediction and recommendation), GCDIA has become increasingly important. Existing multi-model databases (MMDBs) struggle to efficiently support both integration (GCDI) and analytics (GCDA) in GCDIA. They typically separate graph processing from other models without global optimization for GCDI, while relying on tuple-at-a-time execution for GCDA, leading to limited performance and scalability. To address these limitations, we propose GredoDB, a unified MMDB that natively supports storing graph, relational, and document models, while efficiently processing GCDIA. Specifically, we design 1) topology- and attribute-aware graph operators for efficient predicate-aware traversal, 2) a unified GCDI optimization framework to exploit cross-model correlations, and 3) a parallel GCDA architecture that materializes intermediate results for operator-level execution. Experiments on the widely adopted multi-model benchmark M2Bench demonstrate that, in terms of response time, GredoDB achieves up to 107.89 times and an average of 10.89 times speedup on GCDI, and up to 356.72 times and an average of 37.79 times on GCDA, compared to state-of-the-art (SOTA) MMDBs.
Paper Structure (26 sections, 13 equations, 12 figures, 4 tables, 3 algorithms)

This paper contains 26 sections, 13 equations, 12 figures, 4 tables, 3 algorithms.

Figures (12)

  • Figure 2: An overview of the architecture of GredoDB. The bold black arrows indicate the primary execution pipeline, while the thin black and brown arrows respectively denote data flows in GCDI and GCDA processing.
  • Figure 3: An example of graph storage format.
  • Figure 4: An example of hybrid traversal operations.
  • Figure 5: An example of a pattern.
  • Figure 6: Optimization mechanisms for patterns with different predicate types. The left of the dashed line shows the types represented by the pattern, while the right shows GredoDB’s candidate or executed plans.
  • ...and 7 more figures

Theorems & Definitions (5)

  • definition thmcounterdefinition: Relational Model
  • definition thmcounterdefinition: Document Model
  • definition thmcounterdefinition: Graph Model
  • definition thmcounterdefinition: Adjacency Graph
  • definition thmcounterdefinition: Predicate