Table of Contents
Fetching ...

More than Meets the Tie: Examining the Role of Interpersonal Relationships in Social Networks

Minje Choi, Ceren Budak, Daniel M. Romero, David Jurgens

TL;DR

This work tackles how interpersonal relationship types shape communication and information diffusion on Twitter by leveraging a large-scale dataset of 9.6 million dyads with self-declared labels for five categories. It introduces a RoBERTa-based hierarchical model that integrates text from tweets and bios with network features to classify relationship types, achieving a macro $F1$ score of $0.70$, well above baselines. The study further demonstrates that incorporating relationship type improves retweet prediction, providing a 1% lift in $F1$ and notable recall gains, especially for content without URLs. Overall, the findings highlight the value of relationship-aware network modeling for understanding diffusion processes and social dynamics in online platforms.

Abstract

Topics in conversations depend in part on the type of interpersonal relationship between speakers, such as friendship, kinship, or romance. Identifying these relationships can provide a rich description of how individuals communicate and reveal how relationships influence the way people share information. Using a dataset of more than 9.6M dyads of Twitter users, we show how relationship types influence language use, topic diversity, communication frequencies, and diurnal patterns of conversations. These differences can be used to predict the relationship between two users, with the best predictive model achieving a macro F1 score of 0.70. We also demonstrate how relationship types influence communication dynamics through the task of predicting future retweets. Adding relationships as a feature to a strong baseline model increases the F1 and recall by 1% and 2%. The results of this study suggest relationship types have the potential to provide new insights into how communication and information diffusion occur in social networks.

More than Meets the Tie: Examining the Role of Interpersonal Relationships in Social Networks

TL;DR

This work tackles how interpersonal relationship types shape communication and information diffusion on Twitter by leveraging a large-scale dataset of 9.6 million dyads with self-declared labels for five categories. It introduces a RoBERTa-based hierarchical model that integrates text from tweets and bios with network features to classify relationship types, achieving a macro score of , well above baselines. The study further demonstrates that incorporating relationship type improves retweet prediction, providing a 1% lift in and notable recall gains, especially for content without URLs. Overall, the findings highlight the value of relationship-aware network modeling for understanding diffusion processes and social dynamics in online platforms.

Abstract

Topics in conversations depend in part on the type of interpersonal relationship between speakers, such as friendship, kinship, or romance. Identifying these relationships can provide a rich description of how individuals communicate and reveal how relationships influence the way people share information. Using a dataset of more than 9.6M dyads of Twitter users, we show how relationship types influence language use, topic diversity, communication frequencies, and diurnal patterns of conversations. These differences can be used to predict the relationship between two users, with the best predictive model achieving a macro F1 score of 0.70. We also demonstrate how relationship types influence communication dynamics through the task of predicting future retweets. Adding relationships as a feature to a strong baseline model increases the F1 and recall by 1% and 2%. The results of this study suggest relationship types have the potential to provide new insights into how communication and information diffusion occur in social networks.

Paper Structure

This paper contains 31 sections, 9 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Probability of containing a LIWC-category word in a directed mention to a specific relationship type. Romance and parasocial relationships express high levels of self-disclosure by using more singular pronouns, while organizational relationships use more plural pronouns to show collective identity. Swearing is most common among social and least common within organizational relationships, possibly due to differences in social distance. Work- and family-related words are associated with the respective relationship categories. Here and throughout the paper, error bars denote bootstrapped 95% confidence intervals.
  • Figure 1: A comparison of mention frequency across hours of day between dyads with (a) labeled relationships obtained through self-declared mentions, and (b) inferred relationships obtained through the relationship prediction classifiers. Some of the relationship-specific characteristics such as a focus of daytime communication for organization relationships are visible in the inferred categories as well. Shaded regions show 95% bootstrapped confidence intervals.
  • Figure 2: The average entropy of topic distributions obtained from directed mention tweets. The entropy is significantly higher for social and romance relationships, which shows these relationships contain more topics in their conversations.
  • Figure 3: Network and communication features. Jaccard and Adamic-Adar scores are lowest for parasocial relationships, indicating a low similarity in neighbors of a dyad. Romance has both the highest mention probability and reciprocity, signalling the strongest level of mutual communication.
  • Figure 4: A comparison of mention frequency across hours of day reveal striking difference in temporal dynamics between relationship categories (a,b) and subcategories (c,d) where (b), (c) and (d) are centered relative to the mean temporal distribution across all relationship categories: (a) The un-centered communication frequency among categories (b) the centered communication frequency among categories (c) The centered communication frequency for four Romance subcategories (d) The centered communication frequency for four Family subcategories. Shaded regions show 95% bootstrapped confidence intervals.
  • ...and 1 more figures