Table of Contents
Fetching ...

Siren Federate: Bridging document, relational, and graph models for exploratory graph analysis

Georgeta Bordea, Stephane Campinas, Matteo Catena, Renaud Delbru

TL;DR

Siren Federate tackles the challenge of interactive exploratory analysis over billions of heterogeneous knowledge graph items by unifying document IR, relational, and graph processing within Elasticsearch. It introduces distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and a novel Semi-Join Decomposition technique to mitigate intermediate result explosion in path queries. The system validates these ideas through large-scale benchmarks, the LDBC Finbench, and real-world deployment insights, showing scalable, sub-second to second latency under heavy data and concurrency. By bridging multiple data models and enabling efficient path and graph analytics at scale, Siren Federate provides a practical pathway for investigative intelligence platforms that require rich search, graph exploration, and multi-modal data handling.

Abstract

Investigative workflows require interactive exploratory analysis on large heterogeneous knowledge graphs. Current databases show limitations in enabling such task. This paper discusses the architecture of Siren Federate, a system that efficiently supports exploratory graph analysis by bridging document-oriented, relational and graph models. Technical contributions include distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and semi-join decomposition for path query. Semi-join decomposition addresses the exponential growth of intermediate results in path-based queries. Experiments show that Siren Federate exhibits low latency and scales well with the amount of data, the number of users, and the number of computing nodes.

Siren Federate: Bridging document, relational, and graph models for exploratory graph analysis

TL;DR

Siren Federate tackles the challenge of interactive exploratory analysis over billions of heterogeneous knowledge graph items by unifying document IR, relational, and graph processing within Elasticsearch. It introduces distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and a novel Semi-Join Decomposition technique to mitigate intermediate result explosion in path queries. The system validates these ideas through large-scale benchmarks, the LDBC Finbench, and real-world deployment insights, showing scalable, sub-second to second latency under heavy data and concurrency. By bridging multiple data models and enabling efficient path and graph analytics at scale, Siren Federate provides a practical pathway for investigative intelligence platforms that require rich search, graph exploration, and multi-modal data handling.

Abstract

Investigative workflows require interactive exploratory analysis on large heterogeneous knowledge graphs. Current databases show limitations in enabling such task. This paper discusses the architecture of Siren Federate, a system that efficiently supports exploratory graph analysis by bridging document-oriented, relational and graph models. Technical contributions include distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and semi-join decomposition for path query. Semi-join decomposition addresses the exponential growth of intermediate results in path-based queries. Experiments show that Siren Federate exhibits low latency and scales well with the amount of data, the number of users, and the number of computing nodes.

Paper Structure

This paper contains 42 sections, 1 theorem, 16 figures, 1 table.

Key Result

Theorem 1

For each subsequent decomposition $q_i$, all semi-join operations except one are identical to those performed by the previous decomposition $q_{i-1}$, benefiting directly from cached results.

Figures (16)

  • Figure 1: An example graph
  • Figure 2: The related ontology
  • Figure 3: The example graph in a document-oriented store
  • Figure 4: The ontology after reifications
  • Figure 5: A staged logical query plan
  • ...and 11 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof