Table of Contents
Fetching ...

Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models

Alessandro De Bellis, Salvatore Bufi, Giovanni Servedio, Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio

TL;DR

TyleR is introduced, a Type-less yet type-awaRe approach for subgraph-based inductive link prediction that leverages PLMs for semantic enrichment that outperforms state-of-the-art baselines in scenarios with scarce type annotations and sparse graph connectivity.

Abstract

Inductive link prediction is emerging as a key paradigm for real-world knowledge graphs (KGs), where new entities frequently appear and models must generalize to them without retraining. Predicting links in a KG faces the challenge of guessing previously unseen entities by leveraging generalizable node features such as subgraph structure, type annotations, and ontological constraints. However, explicit type information is often lacking or incomplete. Even when available, type information in most KGs is often coarse-grained, sparse, and prone to errors due to human annotation. In this work, we explore the potential of pre-trained language models (PLMs) to enrich node representations with implicit type signals. We introduce TyleR, a Type-less yet type-awaRe approach for subgraph-based inductive link prediction that leverages PLMs for semantic enrichment. Experiments on standard benchmarks demonstrate that TyleR outperforms state-of-the-art baselines in scenarios with scarce type annotations and sparse graph connectivity. To ensure reproducibility, we share our code at https://github.com/sisinflab/tyler .

Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models

TL;DR

TyleR is introduced, a Type-less yet type-awaRe approach for subgraph-based inductive link prediction that leverages PLMs for semantic enrichment that outperforms state-of-the-art baselines in scenarios with scarce type annotations and sparse graph connectivity.

Abstract

Inductive link prediction is emerging as a key paradigm for real-world knowledge graphs (KGs), where new entities frequently appear and models must generalize to them without retraining. Predicting links in a KG faces the challenge of guessing previously unseen entities by leveraging generalizable node features such as subgraph structure, type annotations, and ontological constraints. However, explicit type information is often lacking or incomplete. Even when available, type information in most KGs is often coarse-grained, sparse, and prone to errors due to human annotation. In this work, we explore the potential of pre-trained language models (PLMs) to enrich node representations with implicit type signals. We introduce TyleR, a Type-less yet type-awaRe approach for subgraph-based inductive link prediction that leverages PLMs for semantic enrichment. Experiments on standard benchmarks demonstrate that TyleR outperforms state-of-the-art baselines in scenarios with scarce type annotations and sparse graph connectivity. To ensure reproducibility, we share our code at https://github.com/sisinflab/tyler .

Paper Structure

This paper contains 26 sections, 17 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of TyleR. The process begins with ① extracting the enclosing subgraph and ② applying a node labeling strategy. Multi-faceted, semantic representations are then derived using a pre-trained language model ③a, ③b. Finally, a graph neural network ④ integrates structural and semantic information to obtain the final prediction.
  • Figure 2: Link Prediction (Hits@10) evaluation under varying structural sparsity conditions (i.e., the number of edges L in the enclosing subgraph of the target triple, including the target triple) on FB237-V1 (top) and YAGO21K-610 (bottom).
  • Figure 3: Hits@10 performance across four type sparsity groups for three FB237 variants, computed according to the number of explicit types linked to each entity (details in \ref{['sec:rq3_sparsity']}). The groups, from left to right, represent scenarios with an increasing number of explicit types associated with the known entity.
  • Figure 4: Visualization of last layer embeddings (using PCA) for the ontology-enhanced model of zhou-etal-2023-inductive for 50 candidate entities when predicting the missing tail for triple <Andrei Gashkin, playsFor, ?>. For all the 50 candidates, there is no enclosing subgraph.
  • Figure 5: Visualization of last layer embeddings (using PCA) for TyleR (RoBERTa-L) for 50 candidate entities when predicting the missing tail for triple <Andrei Gashkin, playsFor, ?>. For all the 50 candidates, there is no enclosing subgraph.