Table of Contents
Fetching ...

Network Formation and Dynamics Among Multi-LLMs

Marios Papachristou, Yuan Yuan

TL;DR

This work investigates whether interactions among multiple large language models reproduce human-like network formation patterns. By deploying uniform and recommender-based candidate selection, real-world and synthetic networks, and discrete choice modeling, the study assesses micro-level principles—preferential attachment, triadic closure, and homophily—and macro-level properties like community structure and small-world behavior. Results show LLMs generally reproduce these network phenomena, with context-dependent shifts (e.g., stronger homophily in friendship-like settings and stronger heterophily in organizational contexts) and strong alignment with human decisions. The findings suggest LLMs can serve as powerful tools for social simulation and synthetic data generation, while highlighting important considerations about bias and alignment in AI-enabled network processes.

Abstract

Social networks profoundly influence how humans form opinions, exchange information, and organize collectively. As large language models (LLMs) are increasingly embedded into social and professional environments, it is critical to understand whether their interactions approximate human-like network dynamics. We develop a framework to study the network formation behaviors of multiple LLM agents and benchmark them against human decisions. Across synthetic and real-world settings, including friendship, telecommunication, and employment networks, we find that LLMs consistently reproduce fundamental micro-level principles such as preferential attachment, triadic closure, and homophily, as well as macro-level properties including community structure and small-world effects. Importantly, the relative emphasis of these principles adapts to context: for example, LLMs favor homophily in friendship networks but heterophily in organizational settings, mirroring patterns of social mobility. A controlled human-subject survey confirms strong alignment between LLMs and human participants in link-formation decisions. These results establish that LLMs can serve as powerful tools for social simulation and synthetic data generation, while also raising critical questions about bias, fairness, and the design of AI systems that participate in human networks.

Network Formation and Dynamics Among Multi-LLMs

TL;DR

This work investigates whether interactions among multiple large language models reproduce human-like network formation patterns. By deploying uniform and recommender-based candidate selection, real-world and synthetic networks, and discrete choice modeling, the study assesses micro-level principles—preferential attachment, triadic closure, and homophily—and macro-level properties like community structure and small-world behavior. Results show LLMs generally reproduce these network phenomena, with context-dependent shifts (e.g., stronger homophily in friendship-like settings and stronger heterophily in organizational contexts) and strong alignment with human decisions. The findings suggest LLMs can serve as powerful tools for social simulation and synthetic data generation, while highlighting important considerations about bias and alignment in AI-enabled network processes.

Abstract

Social networks profoundly influence how humans form opinions, exchange information, and organize collectively. As large language models (LLMs) are increasingly embedded into social and professional environments, it is critical to understand whether their interactions approximate human-like network dynamics. We develop a framework to study the network formation behaviors of multiple LLM agents and benchmark them against human decisions. Across synthetic and real-world settings, including friendship, telecommunication, and employment networks, we find that LLMs consistently reproduce fundamental micro-level principles such as preferential attachment, triadic closure, and homophily, as well as macro-level properties including community structure and small-world effects. Importantly, the relative emphasis of these principles adapts to context: for example, LLMs favor homophily in friendship networks but heterophily in organizational settings, mirroring patterns of social mobility. A controlled human-subject survey confirms strong alignment between LLMs and human participants in link-formation decisions. These results establish that LLMs can serve as powerful tools for social simulation and synthetic data generation, while also raising critical questions about bias, fairness, and the design of AI systems that participate in human networks.
Paper Structure (51 sections, 10 equations, 16 figures, 7 tables, 4 algorithms)

This paper contains 51 sections, 10 equations, 16 figures, 7 tables, 4 algorithms.

Figures (16)

  • Figure 1: Results for Principle 1 (preferential attachment) The multi-LLM setup was given neighborhood information $\{ N_{j, t} : j \in V_t \}$. (a, b): Probability of connecting to top-$k$-degree nodes for varying model (temperature is fixed to 1.0 and environment to baseline), temperature (model fixed to GPT-3.5 and environment to baseline) and environment (model fixed to GPT-3.5 and environment temperature to 1.5) for networks generated according to Principle 1 with $n = 200$ nodes. (a) shows the whole range of $k$, and (b) shows the top $1-2.5\%$ nodes. (c): Power Law exponents and standard errors for varying model, temperature, and environment. (d): Simulated networks. Power-law degree distributions are evident ($P > 0.5$, K-S test), with the networks at a temperature of 1.5 closely resembling the Barabási-Albert model ($P > 0.1$, K-S test) for GPT-3.5 agents.
  • Figure 2: Results for Principle 2 (triadic closure).(a, b): Probability of connecting to top-$k$ nodes (in terms of common neighbors) for varying model (temperature is fixed to 1.0 and environment to baseline), temperature (model fixed to GPT-4 Mini and environment to baseline) and environment (model fixed to GPT-4 Mini and environment temperature to 0.5) for networks generated according to Principle 2 ($n = 50$, 10 simulations for each model, environment and temperature). The dotted diagonal line corresponds to the null model, where connections are made at random. Panel (a) shows top-$k$ for $k$ for $k$ in the range $10-50\%$, and Panel (b) shows top-$k$ for $k$ in the range of $10-100\%$. (c): Marginal transitivity ($D$) and probability of an edge within a community ($\hat{p}$) for networks generated according to Principle 2 in different models, temperatures, and environments. The dotted line corresponds to the random null model. (d): The figure shows the resulting networks created by GPT-4 Mini, according to Principle 2 when the intersection of the neighborhoods of the query node and each alternative is provided and comparison of the metrics $D$ and $\hat{p}$ with the random null model. The node colors correspond to the groups to which each node belongs. The bold edges (red or blue) correspond to the newly created inter-cluster edges, and the orange edges correspond to the new intra-cluster edges. (e, f): Marginal transitivity ($D$) and network instances when the initial network is an Erdös-Rényi graph with $n = 50$ and $p = 0.1$.
  • Figure 3: Results for Principle 3 (Homophily) and Principle 4 (Community structure due to homophily). (a): Assortativity and Louvain modularity by Principle 3 ($n = 50$, 5 runs per row) across school, work, and community settings. All comparisons to the random null model ($R = 0$) are statistically significant ($P < 0.0003$, t-tests with Bonferroni correction for three tests). Modularity is also significantly greater than 0 ($P < 0.001$). (b): Network examples and communities for GPT-3.5 agents. Compared to a null model where agents connect randomly ($R = 0$). (c): Influence of distractor features (favorite color, lucky number) on homophily. Compared to a random null model with $R = 0$. All results are statistically significant ($P < 0.00025$), with Bonferroni correction over 3 tests (location, favorite color, hobby).
  • Figure 4: Fitted results for Principle 5 (small world) for $\beta = 0.25, k = 5$ (a1-a3), $\beta = 0.5, k =5$ (b1-b3), and $\beta = 0.75, k=5$ (c1-c3).(a1-c1):Average clustering coefficient$C$ the average shortest path length$L$. The comparison is made with respect to a Watts-Strogatz graph with $n = 50, k = 5, \beta \in \{ 0.25, 0.5, 0.75 \}$. The error bars correspond to 95% confidence intervals. The results are compared against the Watts-Strogatz model with the same parameters $k$ and $\beta$ as a null model. The t-test comparing $L$ and $C$ for the LLM-generated networks and Watts-Strogatz networks yields $P > 0.05$ (Bonferroni correction for two tests). (a2-c2): Regression plots relating average shortest path length ($L$) and average clustering coefficient ($C$) with $n$. The value $a$ in legends represents the effect size (slope of the regression lines). (a3-c3): Estimated values $\hat{\beta}$ of $\beta \in \{ 0.25, 0.5, 0.75 \}$ for LLM-generated networks based on matching the average clustering coefficient and difference in the average shortest path between LLM-generated networks and Watts-Strogatz with the estimated rewiring probability$\hat{\beta}$ for GPT-3.5 agents. We report the $P$-values of the t-test comparing the average shortest path length of the LLM-generated networks and the average shortest path length of the Watts-Strogatz graphs with rewiring probability $\hat{\beta}$. (d): Regression plot for the relation $L \sim \log (n)$ for different LLM models and environments (school, work, community) for $\beta = 0.25$ and $k = 5$. The legend shows the effect size ($a$) and the $P$-value. The results are compared against the Watts-Strogatz model with the same parameters $k$ and $\beta$ as a null model. (*: $P < 0.025$; **: $P < 0.005$, and ***: $P < 0.0005$, Bonferroni correction for two tests; $L$ and $C$.).
  • Figure 5: Comparison between the network formation decisions among different models for the uniform and the recommendation system sampling strategies. We report the Spearman correlation between the effects corresponding to the fits as well as the total variation (TV) distance between the corresponding fitted models (cf. \ref{['app:alignment']} for a detailed description of the metrics).
  • ...and 11 more figures