Table of Contents
Fetching ...

GitHub Stargazers | Building Graph- and Edge-level Prediction Algorithms for Developer Social Networks

Karishma Thakrar, Aniket Chauhan

TL;DR

The paper tackles segmenting GitHub developer networks into web development and machine learning communities and predicting potential collaborations. It employs Graph Convolutional Networks (GCNs) for graph classification and GraphSAGE for edge-level link prediction, with a Random Forest applied to GCN embeddings to boost performance. Results indicate moderate classification performance (AUC ≈ 0.74) and effective edge recommendations across graphs, highlighting the utility of graph-based analysis for open-source communities. The study provides a scalable framework for market analysis and engagement recommendations within developer networks, with clear paths for incorporating richer features and temporal dynamics in future work.

Abstract

Analyzing social networks formed by developers provides valuable insights for market segmentation, trend analysis, and community engagement. In this study, we explore the GitHub Stargazers dataset to classify developer communities and predict potential collaborations using graph neural networks (GNNs). By modeling 12,725 developer networks, we segment communities based on their focus on web development or machine learning repositories, leveraging graph attributes and node embeddings. Furthermore, we propose an edge-level recommendation algorithm that predicts new connections between developers using similarity measures. Our experimental results demonstrate the effectiveness of our approach in accurately segmenting communities and improving connection predictions, offering valuable insights for understanding open-source developer networks.

GitHub Stargazers | Building Graph- and Edge-level Prediction Algorithms for Developer Social Networks

TL;DR

The paper tackles segmenting GitHub developer networks into web development and machine learning communities and predicting potential collaborations. It employs Graph Convolutional Networks (GCNs) for graph classification and GraphSAGE for edge-level link prediction, with a Random Forest applied to GCN embeddings to boost performance. Results indicate moderate classification performance (AUC ≈ 0.74) and effective edge recommendations across graphs, highlighting the utility of graph-based analysis for open-source communities. The study provides a scalable framework for market analysis and engagement recommendations within developer networks, with clear paths for incorporating richer features and temporal dynamics in future work.

Abstract

Analyzing social networks formed by developers provides valuable insights for market segmentation, trend analysis, and community engagement. In this study, we explore the GitHub Stargazers dataset to classify developer communities and predict potential collaborations using graph neural networks (GNNs). By modeling 12,725 developer networks, we segment communities based on their focus on web development or machine learning repositories, leveraging graph attributes and node embeddings. Furthermore, we propose an edge-level recommendation algorithm that predicts new connections between developers using similarity measures. Our experimental results demonstrate the effectiveness of our approach in accurately segmenting communities and improving connection predictions, offering valuable insights for understanding open-source developer networks.

Paper Structure

This paper contains 20 sections, 1 equation, 9 figures.

Figures (9)

  • Figure 1: Comparison of Node Count and Edge Count for Each Class.
  • Figure 2: Correlation Heatmap of Graph Statistics.
  • Figure 3: Comparing Underlying Network Structures for Each Class.
  • Figure 4: Example Networks in Web Development and Machine Learning Demonstrating Differences in Connectivity.
  • Figure 5: Comparison of Graph Convolutional Network Classifiers.
  • ...and 4 more figures