Table of Contents
Fetching ...

Network Embedding Exploration Tool (NEExT)

Ashkan Dehghan, Paweł Prałat, François Théberge

TL;DR

NEExT tackles the challenge of embedding collections of graphs by enabling user-defined, interpretable node features and fast, Wasserstein-based graph embeddings via the Vectorizers toolkit. It constructs per-node feature vectors (e.g., LSME, centralities, Self-Walk, Expansion), then treats each graph as a probability distribution over feature space and embeds these distributions in a $d$-dimensional space using LOT/Sinkhorn/ApproximateWasserstein with SVD. The framework supports supervised feature selection (Greedy and Fast) and unsupervised feature discovery, and it scales through a node-feature sampling module. Across synthetic ABCD graphs and real-world networks, NEExT achieves competitive accuracy with state-of-the-art methods while maintaining interpretability, and sampling reduces computational cost with modest impact on performance. Overall, NEExT provides a practical, explainable toolkit for analyzing graph collections and can be extended to single graphs via ego-net aggregation.

Abstract

Many real-world and artificial systems and processes can be represented as graphs. Some examples of such systems include social networks, financial transactions, supply chains, and molecular structures. In many of these cases, one needs to consider a collection of graphs, rather than a single network. This could be a collection of distinct but related graphs, such as different protein structures or graphs resulting from dynamic processes on the same network. Examples of the latter include the evolution of social networks, community-induced graphs, or ego-nets around various nodes. A significant challenge commonly encountered is the absence of ground-truth labels for graphs or nodes, necessitating the use of unsupervised techniques to analyze such systems. Moreover, even when ground-truth labels are available, many existing graph machine learning methods depend on complex deep learning models, complicating model explainability and interpretability. To address some of these challenges, we have introduced NEExT (Network Embedding Exploration Tool) for embedding collections of graphs via user-defined node features. The advantages of the framework are twofold: (i) the ability to easily define your own interpretable node-based features in view of the task at hand, and (ii) fast embedding of graphs provided by the Vectorizers library. In this paper, we demonstrate the usefulness of NEExT on collections of synthetic and real-world graphs. For supervised tasks, we demonstrate that performance in graph classification tasks could be achieved similarly to other state-of-the-art techniques while maintaining model interpretability. Furthermore, our framework can also be used to generate high-quality embeddings in an unsupervised way, where target variables are not available.

Network Embedding Exploration Tool (NEExT)

TL;DR

NEExT tackles the challenge of embedding collections of graphs by enabling user-defined, interpretable node features and fast, Wasserstein-based graph embeddings via the Vectorizers toolkit. It constructs per-node feature vectors (e.g., LSME, centralities, Self-Walk, Expansion), then treats each graph as a probability distribution over feature space and embeds these distributions in a -dimensional space using LOT/Sinkhorn/ApproximateWasserstein with SVD. The framework supports supervised feature selection (Greedy and Fast) and unsupervised feature discovery, and it scales through a node-feature sampling module. Across synthetic ABCD graphs and real-world networks, NEExT achieves competitive accuracy with state-of-the-art methods while maintaining interpretability, and sampling reduces computational cost with modest impact on performance. Overall, NEExT provides a practical, explainable toolkit for analyzing graph collections and can be extended to single graphs via ego-net aggregation.

Abstract

Many real-world and artificial systems and processes can be represented as graphs. Some examples of such systems include social networks, financial transactions, supply chains, and molecular structures. In many of these cases, one needs to consider a collection of graphs, rather than a single network. This could be a collection of distinct but related graphs, such as different protein structures or graphs resulting from dynamic processes on the same network. Examples of the latter include the evolution of social networks, community-induced graphs, or ego-nets around various nodes. A significant challenge commonly encountered is the absence of ground-truth labels for graphs or nodes, necessitating the use of unsupervised techniques to analyze such systems. Moreover, even when ground-truth labels are available, many existing graph machine learning methods depend on complex deep learning models, complicating model explainability and interpretability. To address some of these challenges, we have introduced NEExT (Network Embedding Exploration Tool) for embedding collections of graphs via user-defined node features. The advantages of the framework are twofold: (i) the ability to easily define your own interpretable node-based features in view of the task at hand, and (ii) fast embedding of graphs provided by the Vectorizers library. In this paper, we demonstrate the usefulness of NEExT on collections of synthetic and real-world graphs. For supervised tasks, we demonstrate that performance in graph classification tasks could be achieved similarly to other state-of-the-art techniques while maintaining model interpretability. Furthermore, our framework can also be used to generate high-quality embeddings in an unsupervised way, where target variables are not available.

Paper Structure

This paper contains 20 sections, 1 equation, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Examples of graphs generated using the ABCD synthetic graph for Experiment 1, as detailed in Table \ref{['table:abcd_experiment_details_table']}, for $\xi \in \{0.1, 0.2, 0.35, 0.8\}$. Ground-truth communities are represented with different colours.
  • Figure 2: Two dimensional graph embeddings built using the approximate Wasserstein technique and graph features built using the Expansion, LSME, and PageRank node features. The dimension for all the above node embeddings is set to $5$.
  • Figure 3: Mean-absolute-error measured for a regression model built to predict $\xi$ in Experiment 1, as defined in Table \ref{['table:abcd_experiment_details_table']}. The $x$-axis is the length of the feature vectors computed on each graph. The final graph embedding is uniformly set to $d=2$.
  • Figure 4: Two dimensional representations of the approximate Wasserstein graph embeddings built using LSME, Closeness Centrality, and Degree Centrality graph features. Colours correspond to different values of noise ($\xi$) and the size of the underlying graphs ($n$) are shown inside each data point.
  • Figure 5: Left: Accuracy of binary-classifiers built for models M-0 to M-8. Right: two dimensional representation of graph embedding vectors built using features in M-8.
  • ...and 5 more figures