Table of Contents
Fetching ...

Machine Learning for Static and Single-Event Dynamic Complex Network Analysis

Nikolaos Nakis

TL;DR

This thesis addresses the challenge of learning compact, interpretable representations for static and single-event dynamic networks. It introduces a family of Euclidean latent-distance models—HBDM, HM-LDM, SLDM, SLIM, sHM-LDM, and DISEE—extending to unsigned, signed, bipartite, and single-event temporal networks with unified learning procedures. The work delivers scalable, linearithmic algorithms, multi-scale hierarchical and polytope-based embeddings, and identifiability guarantees, achieving strong performance on link prediction, node classification, and community detection while enabling rich visualizations. It also provides a principled generative framework for polarization and temporal impact in networks, with public code and reproducible experiments. Overall, the thesis significantly advances GRL by delivering unified, scalable, and interpretable embeddings across diverse network types and tasks.

Abstract

The primary objective of this thesis is to develop novel algorithmic approaches for Graph Representation Learning of static and single-event dynamic networks. In such a direction, we focus on the family of Latent Space Models, and more specifically on the Latent Distance Model which naturally conveys important network characteristics such as homophily, transitivity, and the balance theory. Furthermore, this thesis aims to create structural-aware network representations, which lead to hierarchical expressions of network structure, community characterization, the identification of extreme profiles in networks, and impact dynamics quantification in temporal networks. Crucially, the methods presented are designed to define unified learning processes, eliminating the need for heuristics and multi-stage processes like post-processing steps. Our aim is to delve into a journey towards unified network embeddings that are both comprehensive and powerful, capable of characterizing network structures and adeptly handling the diverse tasks that graph analysis offers.

Machine Learning for Static and Single-Event Dynamic Complex Network Analysis

TL;DR

This thesis addresses the challenge of learning compact, interpretable representations for static and single-event dynamic networks. It introduces a family of Euclidean latent-distance models—HBDM, HM-LDM, SLDM, SLIM, sHM-LDM, and DISEE—extending to unsigned, signed, bipartite, and single-event temporal networks with unified learning procedures. The work delivers scalable, linearithmic algorithms, multi-scale hierarchical and polytope-based embeddings, and identifiability guarantees, achieving strong performance on link prediction, node classification, and community detection while enabling rich visualizations. It also provides a principled generative framework for polarization and temporal impact in networks, with public code and reproducible experiments. Overall, the thesis significantly advances GRL by delivering unified, scalable, and interpretable embeddings across diverse network types and tasks.

Abstract

The primary objective of this thesis is to develop novel algorithmic approaches for Graph Representation Learning of static and single-event dynamic networks. In such a direction, we focus on the family of Latent Space Models, and more specifically on the Latent Distance Model which naturally conveys important network characteristics such as homophily, transitivity, and the balance theory. Furthermore, this thesis aims to create structural-aware network representations, which lead to hierarchical expressions of network structure, community characterization, the identification of extreme profiles in networks, and impact dynamics quantification in temporal networks. Crucially, the methods presented are designed to define unified learning processes, eliminating the need for heuristics and multi-stage processes like post-processing steps. Our aim is to delve into a journey towards unified network embeddings that are both comprehensive and powerful, capable of characterizing network structures and adeptly handling the diverse tasks that graph analysis offers.

Paper Structure

This paper contains 77 sections, 1 theorem, 43 equations, 31 figures, 1 table.

Key Result

Lemma 8.4.1

Let $\mathcal{G}=(\mathcal{V}, \mathcal{E})$ be a graph and let $\mathcal{C}$ be a cluster with its centroid located at $\boldsymbol{\mu}\in \mathbb{R}^D$ having an edge $(i,j)\in \mathcal{E}$ for some $i\in \mathcal{C}$ and $j\in \mathcal{V}\backslash\mathcal{C}$ such that $\mathbf{z}_i \neq \bolds

Figures (31)

  • Figure 1: Examples of three different types of networks based on their temporal structure. Round points represent network nodes, square points make up the corresponding colored node dyads, arrows represent directed relationships between two nodes, vertical lines represent events, and black lines are the timelines while grey bold lines show that a link (event) appeared once and cannot be observed again. Left panel: Static networks where links occur once and there is no temporal information available. Middle panel: Temporal networks where links are events in time and can be observed multiple times along the timeline. Right panel: Single-event networks where links appear in a temporal manner but they can occur once, defining edges as single events.
  • Figure 2: Downstream tasks for Graph Representation Learning. (a) Link prediction: In this setting, the network is partially observed and the task is to predict the missing links and regain the original network structure. (b) Node classification: In this setting, each network node has a label (in the example we have two labels $a$ and $b$), the task is to infer the node labels for nodes with missing/unknown labels. (c) Community detection: In this setting, the whole network is observed and the task is to infer communities existing in the network (we show an example with two communities $A$ and $B$).
  • Figure 3: Expression of homophily and transitivity as imposed by the Latent Distance model. Black lines correspond to network edges. Connected nodes are positioned close to each other to define a high probability of an edge, e.g. pairs $\{i,j\}$ and $\{j,k\}$. Consequently, the distance of node pair $\{i,k\}$ is bounded by the triangle inequality, and thus node pair $\{i,k\}$ has to also be positioned in close proximity.
  • Figure 4: Log-Log plot of the number of network edges versus $N\log N$ where $N$ the number of vertices, for $70$ datasets of the SNAP library snapnets.
  • Figure 5: Schematic representation of the distance matrix calculation for a hierarchical structure of the tree of height $L=3$ and for the number of observations $N=64$. (a) Hierarchical representation of the all-pairs distance matrix. (b) Pairwise distance approximation based on cluster centroids across different levels of the hierarchy hbdm.
  • ...and 26 more figures

Theorems & Definitions (4)

  • Lemma 8.4.1
  • proof
  • Definition 1: Identifiabilty
  • Definition 2: Community champion