Applying Text Embedding Models for Efficient Analysis in Labeled Property Graphs
Michal Podstawski
TL;DR
The paper addresses exploiting rich textual attributes in labeled property graphs to improve semantic analysis. It introduces a lightweight, model-agnostic pipeline that encodes serialized textual properties with pretrained text embeddings and applies standard classifiers for node-label and relation-prediction tasks without changing the graph structure. Across diverse Neo4j datasets, the approach yields strong accuracy and interpretable predictions, demonstrating the practical value of integrating textual semantics into LPG analytics. It also discusses limitations and outlines future directions, including hybridizing with topology-aware encoders and supporting dynamic graphs to scale in real-world settings.
Abstract
Labeled property graphs often contain rich textual attributes that can enhance analytical tasks when properly leveraged. This work explores the use of pretrained text embedding models to enable efficient semantic analysis in such graphs. By embedding textual node and edge properties, we support downstream tasks including node classification and relation prediction with improved contextual understanding. Our approach integrates language model embeddings into the graph pipeline without altering its structure, demonstrating that textual semantics can significantly enhance the accuracy and interpretability of property graph analysis.
