Table of Contents
Fetching ...

Theory Discovery in Social Networks: Automating ERGM Specification with Large Language Models

Yidan Sun, Mayank Kejriwal

Abstract

Understanding how social networks form, whether through reciprocity, shared attributes, or triadic closure, is central to computational social science. Exponential Random Graph Models (ERGMs) offer a principled framework for testing such formation theories, but translating qualitative social hypotheses into stable statistical specifications remains a significant barrier, requiring expertise in both network theory and model estimation. We present Forge (Formation-Oriented Reasoning with Guarded ERGMs), a framework that uses large language models to automate this translation. Given a network and an informal description of the social context, Forge proposes candidate formation mechanisms, validates them against feasibility and stability constraints, and iteratively refines specifications using goodness-of-fit diagnostics. Evaluation across twelve benchmark networks spanning schools, organizations, and online communication shows that Forge converges in 10 of 12 cases, and conditional on convergence it achieves the best likelihood-based fit in 9 of 10 while meeting adequacy thresholds. By combining LLM-based proposals with statistical guardrails, Forge reduces the manual effort required for ERGM specification.

Theory Discovery in Social Networks: Automating ERGM Specification with Large Language Models

Abstract

Understanding how social networks form, whether through reciprocity, shared attributes, or triadic closure, is central to computational social science. Exponential Random Graph Models (ERGMs) offer a principled framework for testing such formation theories, but translating qualitative social hypotheses into stable statistical specifications remains a significant barrier, requiring expertise in both network theory and model estimation. We present Forge (Formation-Oriented Reasoning with Guarded ERGMs), a framework that uses large language models to automate this translation. Given a network and an informal description of the social context, Forge proposes candidate formation mechanisms, validates them against feasibility and stability constraints, and iteratively refines specifications using goodness-of-fit diagnostics. Evaluation across twelve benchmark networks spanning schools, organizations, and online communication shows that Forge converges in 10 of 12 cases, and conditional on convergence it achieves the best likelihood-based fit in 9 of 10 while meeting adequacy thresholds. By combining LLM-based proposals with statistical guardrails, Forge reduces the manual effort required for ERGM specification.
Paper Structure (42 sections, 7 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 42 sections, 7 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: The overall framework of Forge. Stage I (Candidate Specification Generation) uses network diagnostics and attribute information to prompt an LLM to propose candidate ERGM terms and assemble complete model specifications. Stage II (Screening and Model Selection) evaluates these candidates using feasibility checks and fast MPLE screening, selecting a single specification for likelihood-based estimation. Stage III (Iterative Specification Refinement) refits and incrementally updates the selected model using goodness-of-fit diagnostics under fixed stability constraints to improve adequacy. Stage IV (Post-hoc Theory Interpretation) summarizes the final fitted specification in terms of recognizable social mechanisms such as reciprocity, triadic closure, and homophily.
  • Figure 2: Term proposal performance across five LLMs. Precision measures the fraction of proposed terms that are admissible; recall measures the fraction of admissible terms proposed; off-menu rate is the fraction of proposals violating feasibility or stability constraints.
  • Figure 3: Outputs produced by Forge for two benchmark networks. Each panel reports the final ERGM specification returned by the framework, together with the associated mapping from ERGM terms to network mechanisms and their substantive interpretation. (a) Noordin Top terrorist network. (b) Glasgow s50 social network (directed).