Table of Contents
Fetching ...

Analyzing the Evolution of Graphs and Texts

Xingzhi Guo

TL;DR

This dissertation aims to efficiently model the dynamics in graphs and understand the changes in texts and utilize the renowned Personalized PageRank algorithm to create effective dynamic network embeddings for evolving graphs and self-presented occupational identities in Twitter users' biographies over five years.

Abstract

With the recent advance of representation learning algorithms on graphs (e.g., DeepWalk/GraphSage) and natural languages (e.g., Word2Vec/BERT) , the state-of-the art models can even achieve human-level performance over many downstream tasks, particularly for the task of node and sentence classification. However, most algorithms focus on large-scale models for static graphs and text corpus without considering the inherent dynamic characteristics or discovering the reasons behind the changes. This dissertation aims to efficiently model the dynamics in graphs (such as social networks and citation graphs) and understand the changes in texts (specifically news titles and personal biographies). To achieve this goal, we utilize the renowned Personalized PageRank algorithm to create effective dynamic network embeddings for evolving graphs. Our proposed approaches significantly improve the running time and accuracy for both detecting network abnormal intruders and discovering entity meaning shifts over large-scale dynamic graphs. For text changes, we analyze the post-publication changes in news titles to understand the intents behind the edits and discuss the potential impact of titles changes from information integrity perspective. Moreover, we investigate self-presented occupational identities in Twitter users' biographies over five years, investigating job prestige and demographics effects in how people disclose jobs, quantifying over-represented jobs and their transitions over time.

Analyzing the Evolution of Graphs and Texts

TL;DR

This dissertation aims to efficiently model the dynamics in graphs and understand the changes in texts and utilize the renowned Personalized PageRank algorithm to create effective dynamic network embeddings for evolving graphs and self-presented occupational identities in Twitter users' biographies over five years.

Abstract

With the recent advance of representation learning algorithms on graphs (e.g., DeepWalk/GraphSage) and natural languages (e.g., Word2Vec/BERT) , the state-of-the art models can even achieve human-level performance over many downstream tasks, particularly for the task of node and sentence classification. However, most algorithms focus on large-scale models for static graphs and text corpus without considering the inherent dynamic characteristics or discovering the reasons behind the changes. This dissertation aims to efficiently model the dynamics in graphs (such as social networks and citation graphs) and understand the changes in texts (specifically news titles and personal biographies). To achieve this goal, we utilize the renowned Personalized PageRank algorithm to create effective dynamic network embeddings for evolving graphs. Our proposed approaches significantly improve the running time and accuracy for both detecting network abnormal intruders and discovering entity meaning shifts over large-scale dynamic graphs. For text changes, we analyze the post-publication changes in news titles to understand the intents behind the edits and discuss the potential impact of titles changes from information integrity perspective. Moreover, we investigate self-presented occupational identities in Twitter users' biographies over five years, investigating job prestige and demographics effects in how people disclose jobs, quantifying over-represented jobs and their transitions over time.

Paper Structure

This paper contains 198 sections, 15 theorems, 58 equations, 38 figures, 45 tables, 9 algorithms.

Key Result

Lemma 1

ForwardLocalPush has the following invariant property

Figures (38)

  • Figure 1: The illustration of word embedding algorithms.
  • Figure 2: The illustration of network embeddings.
  • Figure 3: The example of global PageRank (Left) and Personalized PageRank for node 7 (Right). Their PageRank values are below the nodes and colored blue. Personalized PageRank measures the neighbor importance given one starting node, while Global PageRank measures the overall importance of nodes.
  • Figure 4: (a) The model of dynamic network in two consecutive snapshots. (b) An application of DynamicPPE to keep track embedding movements of interesting Wikipedia articles (vertices). We learn embeddings of two presidents of the United States on the whole English Wikipedia graph from 2012 monthly, which cumulatively involves 6.2M articles (nodes) and 170M internal links (edges). The embedding movement between two time points is defined as $1-\cos(\bm w_v^t, \bm w_v^{t+1})$ where $\cos(\cdot,\cdot)$ is the cosine similarity. The significant embedding movements may reflect big social status changes of Donald_Trump and Joe_Biden in this dynamic Wikipedia graph.
  • Figure 5: $\epsilon$ as a function of year for the task of node classification on the English Wikipedia graph. Each line corresponds to a fixed precision strategy of DynamicSNE. Clearly, when the precision parameter $\epsilon$ decreases, the performance of node classification improves.
  • ...and 33 more figures

Theorems & Definitions (34)

  • Definition 1: Personalized PageRank Vector (PPV)
  • Lemma 1: Invariant property hong2016discriminating
  • Lemma 2: Approximation error and time complexity andersen2006localzhang2016approximate
  • Lemma 3: Variational Formulation of Personalized PageRank
  • Definition 4: Simple dynamic graph model kazemi2020representation
  • Definition 5: Subset dynamic network embedding problem
  • Lemma 6
  • proof
  • Theorem 7
  • proof
  • ...and 24 more