Table of Contents
Fetching ...

On the effect of the average clustering coefficient on topology-based link prediction in featureless graphs

Mehrdad Rafiepour, S. Mehdi Vahidipour

TL;DR

The paper investigates whether the average clustering coefficient (AvCC) can serve as a density-aware criterion for topology-based link prediction in featureless graphs. It develops a Barabási–Albert–based graph alteration method to controllably raise AvCC while preserving heterogeneity, and derives an empirical AvCC boundary (approximately $0.27$–$0.37$) that delineates when degree-dependent methods outperform common-neighbor scores. The boundary is validated on real networks and Open Graph Benchmark datasets, showing that Jaccard-based predictions excel in denser graphs while HEI/HOI are preferable in sparser regimes. The work provides a practical, data-light guideline for selecting topology-based link predictors without node features, highlighting the role of graph density in determining method effectiveness.

Abstract

Link prediction is a fundamental problem in graph theory with diverse applications, including recommender systems, community detection, and identifying spurious connections. While feature-based methods achieve high accuracy, their reliance on node attributes limits their applicability in featureless graphs. For such graphs, structure-based approaches, including common neighbor-based and degree-dependent methods, are commonly employed. However, the effectiveness of these methods depends on graph density, with common neighbor-based algorithms performing well in dense graphs and degree-dependent methods being more suitable for sparse or tree-like graphs. Despite this, the literature lacks a clear criterion to distinguish between dense and sparse graphs. This paper introduces the average clustering coefficient as a criterion for assessing graph density to assist with the choice of link prediction algorithms. To address the scarcity of datasets for empirical analysis, we propose a novel graph generation method based on the Barabasi-Albert model, which enables controlled variation of graph density while preserving structural heterogeneity. Through comprehensive experiments on synthetic and real-world datasets, we establish an empirical boundary for the average clustering coefficient that facilitates the selection of effective link prediction techniques.

On the effect of the average clustering coefficient on topology-based link prediction in featureless graphs

TL;DR

The paper investigates whether the average clustering coefficient (AvCC) can serve as a density-aware criterion for topology-based link prediction in featureless graphs. It develops a Barabási–Albert–based graph alteration method to controllably raise AvCC while preserving heterogeneity, and derives an empirical AvCC boundary (approximately ) that delineates when degree-dependent methods outperform common-neighbor scores. The boundary is validated on real networks and Open Graph Benchmark datasets, showing that Jaccard-based predictions excel in denser graphs while HEI/HOI are preferable in sparser regimes. The work provides a practical, data-light guideline for selecting topology-based link predictors without node features, highlighting the role of graph density in determining method effectiveness.

Abstract

Link prediction is a fundamental problem in graph theory with diverse applications, including recommender systems, community detection, and identifying spurious connections. While feature-based methods achieve high accuracy, their reliance on node attributes limits their applicability in featureless graphs. For such graphs, structure-based approaches, including common neighbor-based and degree-dependent methods, are commonly employed. However, the effectiveness of these methods depends on graph density, with common neighbor-based algorithms performing well in dense graphs and degree-dependent methods being more suitable for sparse or tree-like graphs. Despite this, the literature lacks a clear criterion to distinguish between dense and sparse graphs. This paper introduces the average clustering coefficient as a criterion for assessing graph density to assist with the choice of link prediction algorithms. To address the scarcity of datasets for empirical analysis, we propose a novel graph generation method based on the Barabasi-Albert model, which enables controlled variation of graph density while preserving structural heterogeneity. Through comprehensive experiments on synthetic and real-world datasets, we establish an empirical boundary for the average clustering coefficient that facilitates the selection of effective link prediction techniques.
Paper Structure (22 sections, 7 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 7 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The effect of $d$ on Average Clustering Coefficient $AvCC$.
  • Figure 2: AUC results for $S^{HEI}$ and $S^{JAC}$ with rise in the value of $d$ and subsequently the Average Clustering Coefficient denoted by $AvCC$.
  • Figure 3: Changes in the average clustering coefficient with changes in the number of nodes $N$ in a graph.