Table of Contents
Fetching ...

A Binary Classification Social Network Dataset for Graph Machine Learning

Adnan Ali, Jinglong Li, Huanhuan Chen, AlMotasem Bellah Al Ajlouni

TL;DR

The paper introduces BiSND, a binary-classification social network dataset designed for graph machine learning, available in both tabular and graph formats to enable diverse modeling approaches. It evaluates supervised, self-supervised, and graph-based methods (DT, RF, XGBoost, KNN, MLP, GCN, and GCL methods BGRL, GRACE, DAENS) across multiple graph variants, achieving F1-scores from the mid-60s to about 70%. DAENS, especially the 2D2D variant, delivers the best performance among the graph-contrastive methods, illustrating the potential of self-supervised learning for sparse social-graph classification. The results demonstrate BiSND’s robustness and suitability for classification tasks, while highlighting that tree-based methods excel on tabular data and graph/self-supervised approaches can surpass them on graph data. Overall, BiSND provides a solid benchmark for future exploration in graph ML on binary social-network problems and social-media presence detection.

Abstract

Social networks have a vast range of applications with graphs. The available benchmark datasets are citation, co-occurrence, e-commerce networks, etc, with classes ranging from 3 to 15. However, there is no benchmark classification social network dataset for graph machine learning. This paper fills the gap and presents the Binary Classification Social Network Dataset (\textit{BiSND}), designed for graph machine learning applications to predict binary classes. We present the BiSND in \textit{tabular and graph} formats to verify its robustness across classical and advanced machine learning. We employ a diverse set of classifiers, including four traditional machine learning algorithms (Decision Trees, K-Nearest Neighbour, Random Forest, XGBoost), one Deep Neural Network (multi-layer perceptrons), one Graph Neural Network (Graph Convolutional Network), and three state-of-the-art Graph Contrastive Learning methods (BGRL, GRACE, DAENS). Our findings reveal that BiSND is suitable for classification tasks, with F1-scores ranging from 67.66 to 70.15, indicating promising avenues for future enhancements.

A Binary Classification Social Network Dataset for Graph Machine Learning

TL;DR

The paper introduces BiSND, a binary-classification social network dataset designed for graph machine learning, available in both tabular and graph formats to enable diverse modeling approaches. It evaluates supervised, self-supervised, and graph-based methods (DT, RF, XGBoost, KNN, MLP, GCN, and GCL methods BGRL, GRACE, DAENS) across multiple graph variants, achieving F1-scores from the mid-60s to about 70%. DAENS, especially the 2D2D variant, delivers the best performance among the graph-contrastive methods, illustrating the potential of self-supervised learning for sparse social-graph classification. The results demonstrate BiSND’s robustness and suitability for classification tasks, while highlighting that tree-based methods excel on tabular data and graph/self-supervised approaches can surpass them on graph data. Overall, BiSND provides a solid benchmark for future exploration in graph ML on binary social-network problems and social-media presence detection.

Abstract

Social networks have a vast range of applications with graphs. The available benchmark datasets are citation, co-occurrence, e-commerce networks, etc, with classes ranging from 3 to 15. However, there is no benchmark classification social network dataset for graph machine learning. This paper fills the gap and presents the Binary Classification Social Network Dataset (\textit{BiSND}), designed for graph machine learning applications to predict binary classes. We present the BiSND in \textit{tabular and graph} formats to verify its robustness across classical and advanced machine learning. We employ a diverse set of classifiers, including four traditional machine learning algorithms (Decision Trees, K-Nearest Neighbour, Random Forest, XGBoost), one Deep Neural Network (multi-layer perceptrons), one Graph Neural Network (Graph Convolutional Network), and three state-of-the-art Graph Contrastive Learning methods (BGRL, GRACE, DAENS). Our findings reveal that BiSND is suitable for classification tasks, with F1-scores ranging from 67.66 to 70.15, indicating promising avenues for future enhancements.

Paper Structure

This paper contains 44 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Model diagram to create and verify BiSND. First, Users are extracted and labeled, and then tabular and graph data is created from users. At last, supervised and self-supervised learning machine learning models are applied to check the authenticity of BiSND. NodifyGCL refers to node classification graph contrastive learning methods.
  • Figure 2: Decision tree results on four metrics.
  • Figure 3: Random forests results on four metrics.
  • Figure 4: XGB results on four metrics.
  • Figure 5: KNN results on all metrics.
  • ...and 1 more figures