Table of Contents
Fetching ...

GraphLake: A Purpose-Built Graph Compute Engine for Lakehouse

Shige Liu, Songting Chen, Chengjie Qin, Mingxi Wu, Jianguo Wang

TL;DR

GraphLake is built on top of the commercial graph database TigerGraph and introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives.

Abstract

In this paper, we introduce GraphLake, a purpose-built graph compute engine for Lakehouse. GraphLake is built on top of the commercial graph database TigerGraph. It maps Lakehouse tables to vertex and edge types in a labeled property graph and supports graph analytics over Lakehouse tables using GSQL. To minimize startup time, it loads only the graph topology. Furthermore, it introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives. Extensive experiments demonstrate that GraphLake significantly outperforms PuppyGraph, the current state-of-the-art graph compute engine for Lakehouse, by achieving both lower startup and query time.

GraphLake: A Purpose-Built Graph Compute Engine for Lakehouse

TL;DR

GraphLake is built on top of the commercial graph database TigerGraph and introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives.

Abstract

In this paper, we introduce GraphLake, a purpose-built graph compute engine for Lakehouse. GraphLake is built on top of the commercial graph database TigerGraph. It maps Lakehouse tables to vertex and edge types in a labeled property graph and supports graph analytics over Lakehouse tables using GSQL. To minimize startup time, it loads only the graph topology. Furthermore, it introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives. Extensive experiments demonstrate that GraphLake significantly outperforms PuppyGraph, the current state-of-the-art graph compute engine for Lakehouse, by achieving both lower startup and query time.
Paper Structure (27 sections, 16 figures, 2 tables)

This paper contains 27 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Balance of Startup Time and Query Time
  • Figure 2: System Overview
  • Figure 3: Topology-Only Loading
  • Figure 4: ID Column Size vs. Table Size in LDBC_SNB SF100
  • Figure 5: Cache Management
  • ...and 11 more figures