Table of Contents
Fetching ...

Real-World Graph Analysis: Techniques for Static, Dynamic, and Temporal Communities

Davide Rucci

TL;DR

The thesis tackles the problem of extracting meaningful communities from large graphs by enumerating subgraph patterns such as $k$-graphlets and $k$-cores across static, dynamic, and temporal settings. It advances the state of the art with novel enumeration techniques, notably a $O(k^2)$-amortized-time graphlet enumeration algorithm and the cache-aware CAGE design that dramatically speeds up practical runs. It extends community analysis to temporal graphs via a unified $(k,h,\Delta)$-core framework and demonstrates how temporal cores reveal backbone structures and dynamics, complemented by a densest-subgraph approach based on dynamic graph orientation with $(1+\epsilon)$-approximation guarantees. Together, these contributions yield scalable, output-sensitive tools for complete subgraph listing and robust temporal analysis, enabling deeper insight into real-world networks and practical large-scale graph analysis.

Abstract

Graphs are widely used in various fields of computer science. They have also found application in unrelated areas, leading to a diverse range of problems. These problems can be modeled as relationships between entities in various contexts, such as social networks, protein interactions in cells, and route maps. Therefore it is logical to analyze these data structures with diverse approaches, whether they are numerical or structural, global or local, approximate or exact. In particular, the concept of community plays an important role in local structural analysis, as it is able to highlight the composition of the underlying graph while providing insights into what the organization and importance of the nodes in a network look like. This thesis pursues the goal of extracting knowledge from different kinds of graphs, including static, dynamic, and temporal graphs, with a particular focus on their community substructures. To tackle this task we use combinatorial algorithms that can list all the communities in a graph according to different formalizations, such as cliques, $k$-graphlets, and $k$-cores. We first develop new algorithms to enumerate subgraphs, using traditional and novel techniques such as push-out amortization, and CPU cache analysis to boost their efficiency. We then extend these concepts to the analysis of real-world graphs across diverse domains, ranging from social networks to autonomous systems modeled as temporal graphs. In this field, there is currently no widely accepted adaptation, even for straightforward subgraphs like $k$-cores, and the available data is expanding both in terms of quantity and scale. As a result, our findings advance the state of the art both from a theoretical and a practical perspective and can be used in a static or dynamic setting to further speed up and refine graph analysis techniques.

Real-World Graph Analysis: Techniques for Static, Dynamic, and Temporal Communities

TL;DR

The thesis tackles the problem of extracting meaningful communities from large graphs by enumerating subgraph patterns such as -graphlets and -cores across static, dynamic, and temporal settings. It advances the state of the art with novel enumeration techniques, notably a -amortized-time graphlet enumeration algorithm and the cache-aware CAGE design that dramatically speeds up practical runs. It extends community analysis to temporal graphs via a unified -core framework and demonstrates how temporal cores reveal backbone structures and dynamics, complemented by a densest-subgraph approach based on dynamic graph orientation with -approximation guarantees. Together, these contributions yield scalable, output-sensitive tools for complete subgraph listing and robust temporal analysis, enabling deeper insight into real-world networks and practical large-scale graph analysis.

Abstract

Graphs are widely used in various fields of computer science. They have also found application in unrelated areas, leading to a diverse range of problems. These problems can be modeled as relationships between entities in various contexts, such as social networks, protein interactions in cells, and route maps. Therefore it is logical to analyze these data structures with diverse approaches, whether they are numerical or structural, global or local, approximate or exact. In particular, the concept of community plays an important role in local structural analysis, as it is able to highlight the composition of the underlying graph while providing insights into what the organization and importance of the nodes in a network look like. This thesis pursues the goal of extracting knowledge from different kinds of graphs, including static, dynamic, and temporal graphs, with a particular focus on their community substructures. To tackle this task we use combinatorial algorithms that can list all the communities in a graph according to different formalizations, such as cliques, -graphlets, and -cores. We first develop new algorithms to enumerate subgraphs, using traditional and novel techniques such as push-out amortization, and CPU cache analysis to boost their efficiency. We then extend these concepts to the analysis of real-world graphs across diverse domains, ranging from social networks to autonomous systems modeled as temporal graphs. In this field, there is currently no widely accepted adaptation, even for straightforward subgraphs like -cores, and the available data is expanding both in terms of quantity and scale. As a result, our findings advance the state of the art both from a theoretical and a practical perspective and can be used in a static or dynamic setting to further speed up and refine graph analysis techniques.
Paper Structure (80 sections, 24 theorems, 11 equations, 22 figures, 8 tables, 7 algorithms)

This paper contains 80 sections, 24 theorems, 11 equations, 22 figures, 8 tables, 7 algorithms.

Key Result

Proposition 2.1

OutputP = EnumP if and only if = .

Figures (22)

  • Figure 1: (a) a simple graph with 5 vertices and 6 edges; (b) all of its $3$-graphlets; (c) its $2$-core; (d) its maximal cliques; (e) some of its 3-edge subgraphs.
  • Figure 2: (a) an example graph with its 3-graphlets below; (b) $k$-graphlet count in the Brady network lasagne.
  • Figure 3: Amortizing the cost of the recursive call $C$ on its negative descendants: we charge $O(1)$ to each highlighted call. These calls, by the proof of Theorem \ref{['th:ksquare-graphlet-complexity']}, are at least $d(x) / 2$, so we can amortize the $O(d(x))$ cost on them.
  • Figure 4: Percentage of failure leaves over total leaves in the recursion tree, for our whole dataset of 155 graphs and $k = 4, 5, 7 , 9$.
  • Figure 5: The 4 possible ways of completing a $k-3$ graphlet under construction (contained in set $S$). Case 1: pick 3 vertices at distance 1 from $S$; case 2: pick 1 neighbor of $S$ and 2 of its neighbors; case 3: pick 2 neighbors of $S$ and then 1 of their neighbors, if $z$ is adjacent to both $u$ and $v$ (case 3a) then divide the count by 2; case 4: pick 1 neighbor of $S$, then 1 of its neighbors at distance 2 from $S$ and then one of its neighbors at distance 3 from $S$.
  • ...and 17 more figures

Theorems & Definitions (56)

  • Definition 2.1: Enumeration Problem
  • Definition 2.2: EnumP
  • Definition 2.3: Output Polynomial Time
  • Proposition 2.1: capelli2017complexity
  • Definition 2.4: Incremental Polynomial Time
  • Definition 2.5: Delay of an Enumeration Algorithm
  • Definition 2.6: Polynomial Delay
  • Proposition 2.2: Time complexity of the Extension ProblemDBLP:conf/stacs/MaryS16
  • Definition 2.7: Constant Amortized Time
  • Theorem 2.1: Push-Out Amortization Uno_pushout
  • ...and 46 more