On the effect of the average clustering coefficient on topology-based link prediction in featureless graphs
Mehrdad Rafiepour, S. Mehdi Vahidipour
TL;DR
The paper investigates whether the average clustering coefficient (AvCC) can serve as a density-aware criterion for topology-based link prediction in featureless graphs. It develops a Barabási–Albert–based graph alteration method to controllably raise AvCC while preserving heterogeneity, and derives an empirical AvCC boundary (approximately $0.27$–$0.37$) that delineates when degree-dependent methods outperform common-neighbor scores. The boundary is validated on real networks and Open Graph Benchmark datasets, showing that Jaccard-based predictions excel in denser graphs while HEI/HOI are preferable in sparser regimes. The work provides a practical, data-light guideline for selecting topology-based link predictors without node features, highlighting the role of graph density in determining method effectiveness.
Abstract
Link prediction is a fundamental problem in graph theory with diverse applications, including recommender systems, community detection, and identifying spurious connections. While feature-based methods achieve high accuracy, their reliance on node attributes limits their applicability in featureless graphs. For such graphs, structure-based approaches, including common neighbor-based and degree-dependent methods, are commonly employed. However, the effectiveness of these methods depends on graph density, with common neighbor-based algorithms performing well in dense graphs and degree-dependent methods being more suitable for sparse or tree-like graphs. Despite this, the literature lacks a clear criterion to distinguish between dense and sparse graphs. This paper introduces the average clustering coefficient as a criterion for assessing graph density to assist with the choice of link prediction algorithms. To address the scarcity of datasets for empirical analysis, we propose a novel graph generation method based on the Barabasi-Albert model, which enables controlled variation of graph density while preserving structural heterogeneity. Through comprehensive experiments on synthetic and real-world datasets, we establish an empirical boundary for the average clustering coefficient that facilitates the selection of effective link prediction techniques.
