Table of Contents
Fetching ...

Storing and Querying Evolving Graphs in NoSQL Storage Models

Alexandros Spitalas, Anastasios Gounaris, Andreas Kosmatopoulos, Kostas Tsichlas

TL;DR

This paper addresses the challenge of storing and querying evolving graphs with full historical context in NoSQL storage. It introduces a vertex-centric HiNode storage model implemented in MongoDB, designed to compactly store histories and support efficient local and especially global temporal queries through in-database processing. The authors compare MongoDB against Cassandra across snapshot-based and time-interval datasets, demonstrating substantial gains in query speed and reduced client memory usage, and they extend the evaluation to streaming and OLTP/OLAP workloads with ACID transactions. A key contribution is the demonstration that a vertex-centric approach in a document-store like MongoDB can outperform traditional Cassandra-based implementations for complex global queries, while also enabling incremental streaming and what-if analyses via TimeLapse-inspired concepts. The work also provides initial historical-graph datasets generated from the LDBC SNB suite, laying groundwork for future temporal-graph workload benchmarks and broader applicability of time-aware graph analytics in practice.

Abstract

This paper investigates advanced storage models for evolving graphs, focusing on the efficient management of historical data and the optimization of global query performance. Evolving graphs, which represent dynamic relationships between entities over time, present unique challenges in preserving their complete history while supporting complex analytical queries. We first do a fast review of the current state of the art focusing mainly on distributed historical graph databases to provide the context of our proposals. We investigate the im- plementation of an enhanced vertex-centric storage model in MongoDB that prioritizes space efficiency by leveraging in-database query mechanisms to minimize redundant data and reduce storage costs. To ensure broad applicability, we employ datasets, some of which are generated with the LDBC SNB generator, appropriately post-processed to utilize both snapshot- and interval-based representations. Our experimental results both in centralized and distributed infrastructures, demonstrate significant improvements in query performance, particularly for resource-intensive global queries that traditionally suffer from inefficiencies in entity-centric frameworks. The proposed model achieves these gains by optimizing memory usage, reducing client involvement, and exploiting the computational capabilities of MongoDB. By addressing key bottlenecks in the storage and processing of evolving graphs, this study demonstrates a step toward a robust and scalable framework for managing dynamic graph data. This work contributes to the growing field of temporal graph analytics by enabling more efficient ex- ploration of historical data and facilitating real-time insights into the evolution of complex networks.

Storing and Querying Evolving Graphs in NoSQL Storage Models

TL;DR

This paper addresses the challenge of storing and querying evolving graphs with full historical context in NoSQL storage. It introduces a vertex-centric HiNode storage model implemented in MongoDB, designed to compactly store histories and support efficient local and especially global temporal queries through in-database processing. The authors compare MongoDB against Cassandra across snapshot-based and time-interval datasets, demonstrating substantial gains in query speed and reduced client memory usage, and they extend the evaluation to streaming and OLTP/OLAP workloads with ACID transactions. A key contribution is the demonstration that a vertex-centric approach in a document-store like MongoDB can outperform traditional Cassandra-based implementations for complex global queries, while also enabling incremental streaming and what-if analyses via TimeLapse-inspired concepts. The work also provides initial historical-graph datasets generated from the LDBC SNB suite, laying groundwork for future temporal-graph workload benchmarks and broader applicability of time-aware graph analytics in practice.

Abstract

This paper investigates advanced storage models for evolving graphs, focusing on the efficient management of historical data and the optimization of global query performance. Evolving graphs, which represent dynamic relationships between entities over time, present unique challenges in preserving their complete history while supporting complex analytical queries. We first do a fast review of the current state of the art focusing mainly on distributed historical graph databases to provide the context of our proposals. We investigate the im- plementation of an enhanced vertex-centric storage model in MongoDB that prioritizes space efficiency by leveraging in-database query mechanisms to minimize redundant data and reduce storage costs. To ensure broad applicability, we employ datasets, some of which are generated with the LDBC SNB generator, appropriately post-processed to utilize both snapshot- and interval-based representations. Our experimental results both in centralized and distributed infrastructures, demonstrate significant improvements in query performance, particularly for resource-intensive global queries that traditionally suffer from inefficiencies in entity-centric frameworks. The proposed model achieves these gains by optimizing memory usage, reducing client involvement, and exploiting the computational capabilities of MongoDB. By addressing key bottlenecks in the storage and processing of evolving graphs, this study demonstrates a step toward a robust and scalable framework for managing dynamic graph data. This work contributes to the growing field of temporal graph analytics by enabling more efficient ex- ploration of historical data and facilitating real-time insights into the evolution of complex networks.

Paper Structure

This paper contains 32 sections, 19 figures, 9 tables.

Figures (19)

  • Figure 1: OneHop query comparison for the hep-Ph dataset in the Cluster.
  • Figure 2: OneHop query comparison for the US Patents dataset in the Cluster.
  • Figure 3: Degree Distribution Comparison for the Hep-Ph dataset.
  • Figure 4: Degree Distribution Comparison for the Hep-Th dataset.
  • Figure 5: Degree Distribution Comparison for the US Patents dataset.
  • ...and 14 more figures