CGLE: Class-label Graph Link Estimator for Link Prediction

Ankit Mazumder; Srikanta Bedathur

CGLE: Class-label Graph Link Estimator for Link Prediction

Ankit Mazumder, Srikanta Bedathur

TL;DR

CGLE introduces class-label guidance into link prediction by constructing a class-conditioned probability matrix and fusing it with backbone GNN embeddings via an MLP. The method leverages both true or pseudo-class labels to capture global priors on inter-class link formation, enabling improved performance on diverse graphs, including sparse and heterophilous networks. The framework extends NCN/NCNC with a two-phase pipeline: compute class priors in preprocessing and integrate them with structural signals at prediction time, while remaining computationally efficient. Empirical results on a wide range of datasets demonstrate substantial gains over strong baselines, validating the utility of semantic priors in graph link analysis and highlighting CGLE’s practicality and adaptability.

Abstract

Link prediction is a pivotal task in graph mining with wide-ranging applications in social networks, recommendation systems, and knowledge graph completion. However, many leading Graph Neural Network (GNN) models often neglect the valuable semantic information aggregated at the class level. To address this limitation, this paper introduces CGLE (Class-label Graph Link Estimator), a novel framework designed to augment GNN-based link prediction models. CGLE operates by constructing a class-conditioned link probability matrix, where each entry represents the probability of a link forming between two node classes. This matrix is derived from either available ground-truth labels or from pseudo-labels obtained through clustering. The resulting class-based prior is then concatenated with the structural link embedding from a backbone GNN, and the combined representation is processed by a Multi-Layer Perceptron (MLP) for the final prediction. Crucially, CGLE's logic is encapsulated in an efficient preprocessing stage, leaving the computational complexity of the underlying GNN model unaffected. We validate our approach through extensive experiments on a broad suite of benchmark datasets, covering both homophilous and sparse heterophilous graphs. The results show that CGLE yields substantial performance gains over strong baselines such as NCN and NCNC, with improvements in HR@100 of over 10 percentage points on homophilous datasets like Pubmed and DBLP. On sparse heterophilous graphs, CGLE delivers an MRR improvement of over 4% on the Chameleon dataset. Our work underscores the efficacy of integrating global, data-driven semantic priors, presenting a compelling alternative to the pursuit of increasingly complex model architectures. Code to reproduce our findings is available at: https://github.com/data-iitd/cgle-icdm2025.

CGLE: Class-label Graph Link Estimator for Link Prediction

TL;DR

Abstract

CGLE: Class-label Graph Link Estimator for Link Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)