Table of Contents
Fetching ...

Predictive Query-based Pipeline for Graph Data

Plácido A Souza Neto

TL;DR

This work addresses scaling challenges in graph data analysis by leveraging graph embeddings (GraphSAGE, Node2Vec, FastRP) to project complex graphs into lower-dimensional spaces suitable for ML tasks. It proposes a predictive query-based pipeline that integrates data cleaning, graph modeling, multiple projection types (full, strict, strict-extended), and embedding generation, storing embeddings as node properties for flexible comparison. Through the Heart Disease and MovieLens scenarios, the paper demonstrates how embeddings facilitate similarity analysis, visualization, and prediction via Cypher queries and GDS-based KNN, illustrating practical pathways to answer expert queries in health contexts. The findings indicate that embedding choice, dimensionality, and projection type meaningfully influence predictive results, with FastRP often delivering strong discriminative power in health-status classification. The proposed pipeline offers a scalable, exploration-driven approach to transform graph data into actionable predictions and insights, with clear avenues for future work on model diversity and dataset breadth.

Abstract

Graphs face challenges when dealing with massive datasets. They are essential tools for modeling interconnected data and often become computationally expensive. Graph embedding techniques, on the other hand, provide an efficient approach. By projecting complex graphs into a lower-dimensional space, these techniques simplify the analysis and processing of large-scale graphs. By transforming graphs into vectors, it simplifies the analysis and processing of large-scale datasets. Several approaches, such as GraphSAGE, Node2Vec, and FastRP, offer efficient methods for generating graph embeddings. By storing embeddings as node properties, it is possible to compare different embedding techniques and evaluate their effectiveness for specific tasks. This flexibilityallows for dynamic updates to embeddings and facilitates experimentation with different approaches. By analyzing these embeddings, one can extract valuable insights into the relationships between nodes and their similarities within the embedding space

Predictive Query-based Pipeline for Graph Data

TL;DR

This work addresses scaling challenges in graph data analysis by leveraging graph embeddings (GraphSAGE, Node2Vec, FastRP) to project complex graphs into lower-dimensional spaces suitable for ML tasks. It proposes a predictive query-based pipeline that integrates data cleaning, graph modeling, multiple projection types (full, strict, strict-extended), and embedding generation, storing embeddings as node properties for flexible comparison. Through the Heart Disease and MovieLens scenarios, the paper demonstrates how embeddings facilitate similarity analysis, visualization, and prediction via Cypher queries and GDS-based KNN, illustrating practical pathways to answer expert queries in health contexts. The findings indicate that embedding choice, dimensionality, and projection type meaningfully influence predictive results, with FastRP often delivering strong discriminative power in health-status classification. The proposed pipeline offers a scalable, exploration-driven approach to transform graph data into actionable predictions and insights, with clear avenues for future work on model diversity and dataset breadth.

Abstract

Graphs face challenges when dealing with massive datasets. They are essential tools for modeling interconnected data and often become computationally expensive. Graph embedding techniques, on the other hand, provide an efficient approach. By projecting complex graphs into a lower-dimensional space, these techniques simplify the analysis and processing of large-scale graphs. By transforming graphs into vectors, it simplifies the analysis and processing of large-scale datasets. Several approaches, such as GraphSAGE, Node2Vec, and FastRP, offer efficient methods for generating graph embeddings. By storing embeddings as node properties, it is possible to compare different embedding techniques and evaluate their effectiveness for specific tasks. This flexibilityallows for dynamic updates to embeddings and facilitates experimentation with different approaches. By analyzing these embeddings, one can extract valuable insights into the relationships between nodes and their similarities within the embedding space

Paper Structure

This paper contains 5 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Example of a MovieLens Graph.
  • Figure 2: From Graph to Embedding
  • Figure 3: Schema of MovieLens Graph.
  • Figure 4: TSNE, ISOMAP, MDS and Spectral Reduction with Node2Vec Embedding
  • Figure 5: TSNE, ISOMAP, MDS and Spectral Reduction with GraphSAGE Embedding
  • ...and 7 more figures