Table of Contents
Fetching ...

Capturing Tie Strength with Algebraic Topology

Arnab Sarker, Jean-Baptiste Seby, Austin R. Benson, Ali Jadbabaie

Abstract

The association between tie strength and social structure is a fundamental topic in the social sciences. We study this association by analyzing tie strength in higher-order networks, an increasingly relevant model which can encode group interactions between three or more individuals. First, we introduce three measures based on algebraic topology which characterize the network context and influence of an edge. Our experimental results across 15 datasets indicate that these measures outperform standard network proxies in estimating tie strength. We further find that these measures can replicate and explain a puzzle wherein certain bridging ties are surprisingly strong. We then consider a single centrality measure which combines the three initial measures, is highly inversely related to tie strength, and can be interpreted through an information exchange process which highlights ties that have access to useful information. In this sense, we are able to illuminate the information advantages of weak ties due to their network position.

Capturing Tie Strength with Algebraic Topology

Abstract

The association between tie strength and social structure is a fundamental topic in the social sciences. We study this association by analyzing tie strength in higher-order networks, an increasingly relevant model which can encode group interactions between three or more individuals. First, we introduce three measures based on algebraic topology which characterize the network context and influence of an edge. Our experimental results across 15 datasets indicate that these measures outperform standard network proxies in estimating tie strength. We further find that these measures can replicate and explain a puzzle wherein certain bridging ties are surprisingly strong. We then consider a single centrality measure which combines the three initial measures, is highly inversely related to tie strength, and can be interpreted through an information exchange process which highlights ties that have access to useful information. In this sense, we are able to illuminate the information advantages of weak ties due to their network position.

Paper Structure

This paper contains 33 sections, 7 theorems, 26 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Consider a simplicial complex $\mathcal{X}$ and an edge $e\in \mathcal{X}$. Then, $I_e^g = 1$ if and only if $e$ is a global bridge.

Figures (4)

  • Figure 1: Replication of the "U"-shape with Hodge Decomposition in the sms-c dataset. (a) We find that estimates of tie strength computed using the Hodge Decomposition features replicate the non-monotonic relationship between tie strength and tie range. (b) In empirically analyzing the relationship between each Hodge Decomposition feature and tie range, we find that the gradient component increases in tie range, the curl component is only non-trivial for a tie range of 2, and the harmonic component is non-monotonic in tie range. This suggests that tie strength is context dependent. For ties of range 2, the high tie strength can be attributed to the contribution from the curl component, which then drops for all higher tie ranges. Then, as tie range increases from 3 onwards, we find that ties with a higher gradient component have a larger weight in the model than that of the harmonic component, resulting in long-range ties being stronger.
  • Figure 2: Illustration of the Edge PageRank measure. In this figure, we present the "lifted" interpretation of the Edge PageRank measure, though the measure can be computed with a direct matrix computation. (Lower Left) The example begins with a simplicial complex defined on four nodes. The nodes $\{1, 2, 3\}$ have been a part of a higher-order interaction, whereas nodes $2, 3,$ and $4$ have dyadic relationships but no higher-order interaction. In this example, we will compute the Edge PageRank score for the edge $e = \{2, 3\}$, which we initially represent with the indicator vector $\delta_e$. (Upper Left) The first step in Edge PageRank is to "lift" the indicator vector to a vector space which represents each possible direction of each edge, creating the vector $\widehat{\delta_e}$. (Upper Right) Edge PageRank then runs a standard (node) PageRank process in this lifted space. In this PageRank process, the "teleportation" vector is taken to be $\widehat{\delta_e}$ and transition probabilities correspond to a graph where directed edges are connected based on their underlying adjacency in the original simplicial complex. The resulting PageRank vector in the lifted space is represented by $\widehat{\pi}_e$. (Lower Left) Once the "lifted" PageRank process has converged, the resulting values are projected back to the original space of edges by taking the difference between the values corresponding to the two orientations of each edge. This result, $\pi_e$, is referred to as the Edge PageRank vector. (Lower Middle) To assign a score to each edge, we take the $2$-norm of the Edge PageRank vector. Although the edges other than $e = \{2, 3\}$ contribute minimally to the Edge PageRank score in this small example, we note that the teleportation parameter and the size of the network both affect the extent to which edges other than $e$ affect its Edge PageRank score.
  • Figure 3: Edge PageRank scores as a function of tie range in the sms-c dataset. We find that the Edge PageRank measure, in emphasizing the harmonic component of the indicator, also has a non-monotonic relationship with tie range and hence can ultimately capture the "U"-shape relationship between tie strength and tie range.
  • Figure 4: Illustration of the interpretation of the Edge PageRank diffusion as a communication process. Each state of the process corresponds to a directed edge, which we interpret as passed messages. If the state of the random walk is currently $[1, 2]$ (black), i.e. node $1$ has just sent a message to node $2$, then there are three possible types of steps that can be taken next. The lower walk (purple) indicates that, after node $2$ receives a message, then node $2$ may send a message to one of its neighbors, including node $1$. This walk is designed to be reversible (blue), due to the lifted graph being undirected, which we interpret as node $1$ seeking information after sending a message to node $2$. The upper walk (green) represents the effect of higher order information in the process, and enables two communication possibilities which are natural in a co-present interaction. Once node $1$ sends a message to node $2$, then $1$ can send a message to node $3$, as if a message is being sent to both members, or node $3$ might send a message to $2$, as a reaction to node $1$'s message to node $2$. We note that if the triangle were unfilled, then neither green arrow would be possible.

Theorems & Definitions (20)

  • Definition 1: Simplicial Complex hatcher2002algebraic
  • Definition 2: $1$-Hodge Laplacian schaub2018random
  • Definition 3: Hodge Decomposition
  • Definition 4: Gradient, Curl, and Harmonic of an Indicator
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • proof
  • Definition 5: Normalized $1$-Hodge Laplacian schaub2018random
  • Theorem 1: Stochastic Lifting of $\mathcal{L}_1$ schaub2018random
  • ...and 10 more