Table of Contents
Fetching ...

Discovering Communication Pattern Shifts in Large-Scale Labeled Networks using Encoder Embedding and Vertex Dynamics

Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe

TL;DR

A temporal encoder embedding method that leverages ground-truth or estimated vertex labels, enabling an efficient embedding of large-scale graph data and the processing of billions of edges within minutes, and unveils a temporal dynamic statistic capable of detecting communication pattern shifts across all levels.

Abstract

Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends. In particular, the scalability of graph analysis is a critical hurdle impeding progress in large-scale downstream inference. To address this challenge, we introduce a temporal encoder embedding method. This approach leverages ground-truth or estimated vertex labels, enabling an efficient embedding of large-scale graph data and the processing of billions of edges within minutes. Furthermore, this embedding unveils a temporal dynamic statistic capable of detecting communication pattern shifts across all levels, ranging from individual vertices to vertex communities and the overall graph structure. We provide theoretical support to confirm its soundness under random graph models, and demonstrate its numerical advantages in capturing evolving communities and identifying outliers. Finally, we showcase the practical application of our approach by analyzing an anonymized time-series communication network from a large organization spanning 2019-2020, enabling us to assess the impact of Covid-19 on workplace communication patterns.

Discovering Communication Pattern Shifts in Large-Scale Labeled Networks using Encoder Embedding and Vertex Dynamics

TL;DR

A temporal encoder embedding method that leverages ground-truth or estimated vertex labels, enabling an efficient embedding of large-scale graph data and the processing of billions of edges within minutes, and unveils a temporal dynamic statistic capable of detecting communication pattern shifts across all levels.

Abstract

Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends. In particular, the scalability of graph analysis is a critical hurdle impeding progress in large-scale downstream inference. To address this challenge, we introduce a temporal encoder embedding method. This approach leverages ground-truth or estimated vertex labels, enabling an efficient embedding of large-scale graph data and the processing of billions of edges within minutes. Furthermore, this embedding unveils a temporal dynamic statistic capable of detecting communication pattern shifts across all levels, ranging from individual vertices to vertex communities and the overall graph structure. We provide theoretical support to confirm its soundness under random graph models, and demonstrate its numerical advantages in capturing evolving communities and identifying outliers. Finally, we showcase the practical application of our approach by analyzing an anonymized time-series communication network from a large organization spanning 2019-2020, enabling us to assess the impact of Covid-19 on workplace communication patterns.
Paper Structure (20 sections, 4 theorems, 15 equations, 8 figures)

This paper contains 20 sections, 4 theorems, 15 equations, 8 figures.

Key Result

Theorem 1

Assuming the conditional independent random graph model, the temporal encoder embedding converges to a conditional expectation. Specifically, for a vertex $i$ belonging to community $y$, we have that where $a_{t}(i,:) \in \mathbb{R}^{K}$ satisfies

Figures (8)

  • Figure 1: This figure presents a comparison among the encoder embedding with all labels known, the encoder embedding without labels, the unfolded spectral embedding, and graph convolution neural network. The average running time, including both the embedding and computation of dynamic statistics, was computed over 10 replicates. The experiments were conducted with a fixed $K=20$. The left panel varies $n$ from $5000$ to $50000$ with a fixed $t=10$, while the right panel varies $t$ from $10$ to $100$ with a fixed $n=5000$.
  • Figure 2: 3D Visualization of the first 3 communities' vertices at three different times for the simulated graph. The graph dynamic at each time is shown on top.
  • Figure 3: Visualization of the vertex dynamic statistics as time progresses. For the first 3 panels, the y-axis represents the number of vertices, while the x-axis represents the extent to which the vertices have shifted. As time increases, more vertices start to shift away from their starting positions due to noise. The last panel shows the percentage of vertices exceeding the vertex dynamic threshold at the last time step.
  • Figure 4: This figure compares temporal encoder embedding and unfolded spectral embedding in detecting $10$ extreme outliers.
  • Figure 5: This figure visualizes how the encoder embedding successfully detects the changing communication pattern despite community label changes.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof