Table of Contents
Fetching ...

Graph Neural Network for Crawling Target Nodes in Social Networks

Kirill Lukyanov, Mikhail Drobyshevskiy, Danil Shaikhelislamov, Denis Turdakov

TL;DR

This work addresses efficient discovery of target nodes in unknown social graphs under a query budget by leveraging Graph Neural Networks to score candidate nodes based on local neighborhoods. A novel sample boosting technique augments training data during crawling, improving predictor quality in early stages. Empirical results show GNN-based crawlers often outperform classical predictors and exhibit reduced variance, with SAGE and GAT delivering strong performance across diverse target-topology datasets. The approach demonstrates practical potential for scalable targeted crawling in distributed or heterogeneous social networks, and points to future work in richer GNN architectures and online predictor switching.

Abstract

Social networks crawling is in the focus of active research the last years. One of the challenging task is to collect target nodes in an initially unknown graph given a budget of crawling steps. Predicting a node property based on its partially known neighbourhood is at the heart of a successful crawler. In this paper we adopt graph neural networks for this purpose and show they are competitive to traditional classifiers and are better for individual cases. Additionally we suggest a training sample boosting technique, which helps to diversify the training set at early stages of crawling and thus improves the predictor quality. The experimental study on three types of target set topology indicates GNN based approach has a potential in crawling task, especially in the case of distributed target nodes.

Graph Neural Network for Crawling Target Nodes in Social Networks

TL;DR

This work addresses efficient discovery of target nodes in unknown social graphs under a query budget by leveraging Graph Neural Networks to score candidate nodes based on local neighborhoods. A novel sample boosting technique augments training data during crawling, improving predictor quality in early stages. Empirical results show GNN-based crawlers often outperform classical predictors and exhibit reduced variance, with SAGE and GAT delivering strong performance across diverse target-topology datasets. The approach demonstrates practical potential for scalable targeted crawling in distributed or heterogeneous social networks, and points to future work in richer GNN architectures and online predictor switching.

Abstract

Social networks crawling is in the focus of active research the last years. One of the challenging task is to collect target nodes in an initially unknown graph given a budget of crawling steps. Predicting a node property based on its partially known neighbourhood is at the heart of a successful crawler. In this paper we adopt graph neural networks for this purpose and show they are competitive to traditional classifiers and are better for individual cases. Additionally we suggest a training sample boosting technique, which helps to diversify the training set at early stages of crawling and thus improves the predictor quality. The experimental study on three types of target set topology indicates GNN based approach has a potential in crawling task, especially in the case of distributed target nodes.
Paper Structure (12 sections, 2 figures, 4 tables)

This paper contains 12 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Crawling process. Crawled nodes and edges are black, observed are grey, unknown are drawn dotted. Target nodes have label 'T'. Property of the observed node marked with '?' is predicted using its second neighborhood. A GNN predictor structure for this node is indicated with green.
  • Figure 2: Comparison of crawlers on graphs of each type. GNNs achieve better results than classical predictors and have less variance.