Table of Contents
Fetching ...

Understanding friendship formation with explainable machine learning

María Pereda

Abstract

Understanding the formation of social ties requires disentangling the roles of individual traits and local network structure. We analyse signed social relationships among 3,395 students using an interpretable machine learning model -- the Explainable Boosting Machine (EBM) -- to predict link polarity from individual attributes (prosociality, cognitive reflection, and gender) and a structural metric, triadic influence. Our results show that triadic influence overwhelmingly dominates link prediction, confirming that local network structure is the primary driver of social relationships. Nevertheless, a small subset of links (0.24\%) is primarily explained by individual-level traits. A detailed characterisation of this subset reveals that these links do not arise from distinct structural conditions, but rather correspond to weaker and less structurally embedded relationships. In particular, they are more likely to be negative ties and exhibit lower levels of structural balance, whereas triadic-dominant links are strongly associated with positive relationships and highly balanced configurations. Furthermore, we find that links without indirect structural paths are not explained by individual traits, but by the absence of structural reinforcement itself. These findings support a layered view of social tie formation, in which structural mechanisms dominate globally, while individual-level effects emerge in specific, less constrained contexts. More broadly, our work highlights the value of explainable machine learning for uncovering the mechanisms underlying social network formation.

Understanding friendship formation with explainable machine learning

Abstract

Understanding the formation of social ties requires disentangling the roles of individual traits and local network structure. We analyse signed social relationships among 3,395 students using an interpretable machine learning model -- the Explainable Boosting Machine (EBM) -- to predict link polarity from individual attributes (prosociality, cognitive reflection, and gender) and a structural metric, triadic influence. Our results show that triadic influence overwhelmingly dominates link prediction, confirming that local network structure is the primary driver of social relationships. Nevertheless, a small subset of links (0.24\%) is primarily explained by individual-level traits. A detailed characterisation of this subset reveals that these links do not arise from distinct structural conditions, but rather correspond to weaker and less structurally embedded relationships. In particular, they are more likely to be negative ties and exhibit lower levels of structural balance, whereas triadic-dominant links are strongly associated with positive relationships and highly balanced configurations. Furthermore, we find that links without indirect structural paths are not explained by individual traits, but by the absence of structural reinforcement itself. These findings support a layered view of social tie formation, in which structural mechanisms dominate globally, while individual-level effects emerge in specific, less constrained contexts. More broadly, our work highlights the value of explainable machine learning for uncovering the mechanisms underlying social network formation.
Paper Structure (9 sections, 5 equations, 7 figures, 1 table)

This paper contains 9 sections, 5 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Global feature importance derived from the EBM. Each bar represents the average contribution of the corresponding predictor to the model's log-odds output, across all samples. Triadic influence is by far the most influential variable.
  • Figure 2: Dependence plot for triadic influence. The curve represents the contribution of triadic influence to the model’s log-odds output as a function of its value. Positive values increase the probability of friendship, while negative values favour enmity.
  • Figure 3: Dependence plots for cognitive reflection (CRT). Each panel shows the contribution of the variable to the model’s log-odds output.
  • Figure 4: Dependence plots for prosociality. Each panel shows the contribution of the variable to the model’s log-odds output.
  • Figure 5: Local feature contributions for a single predicted link. Each bar represents the contribution of a predictor to the model's log-odds. Positive values increase the probability of a positive link, while negative values decrease it. In this example, the prediction is primarily driven by individual-level traits rather than structural information.
  • ...and 2 more figures