Network Embedding Exploration Tool (NEExT)
Ashkan Dehghan, Paweł Prałat, François Théberge
TL;DR
NEExT tackles the challenge of embedding collections of graphs by enabling user-defined, interpretable node features and fast, Wasserstein-based graph embeddings via the Vectorizers toolkit. It constructs per-node feature vectors (e.g., LSME, centralities, Self-Walk, Expansion), then treats each graph as a probability distribution over feature space and embeds these distributions in a $d$-dimensional space using LOT/Sinkhorn/ApproximateWasserstein with SVD. The framework supports supervised feature selection (Greedy and Fast) and unsupervised feature discovery, and it scales through a node-feature sampling module. Across synthetic ABCD graphs and real-world networks, NEExT achieves competitive accuracy with state-of-the-art methods while maintaining interpretability, and sampling reduces computational cost with modest impact on performance. Overall, NEExT provides a practical, explainable toolkit for analyzing graph collections and can be extended to single graphs via ego-net aggregation.
Abstract
Many real-world and artificial systems and processes can be represented as graphs. Some examples of such systems include social networks, financial transactions, supply chains, and molecular structures. In many of these cases, one needs to consider a collection of graphs, rather than a single network. This could be a collection of distinct but related graphs, such as different protein structures or graphs resulting from dynamic processes on the same network. Examples of the latter include the evolution of social networks, community-induced graphs, or ego-nets around various nodes. A significant challenge commonly encountered is the absence of ground-truth labels for graphs or nodes, necessitating the use of unsupervised techniques to analyze such systems. Moreover, even when ground-truth labels are available, many existing graph machine learning methods depend on complex deep learning models, complicating model explainability and interpretability. To address some of these challenges, we have introduced NEExT (Network Embedding Exploration Tool) for embedding collections of graphs via user-defined node features. The advantages of the framework are twofold: (i) the ability to easily define your own interpretable node-based features in view of the task at hand, and (ii) fast embedding of graphs provided by the Vectorizers library. In this paper, we demonstrate the usefulness of NEExT on collections of synthetic and real-world graphs. For supervised tasks, we demonstrate that performance in graph classification tasks could be achieved similarly to other state-of-the-art techniques while maintaining model interpretability. Furthermore, our framework can also be used to generate high-quality embeddings in an unsupervised way, where target variables are not available.
