Table of Contents
Fetching ...

Introducing New Node Prediction in Graph Mining: Predicting All Links from Isolated Nodes with Graph Neural Networks

Damiano Zanardini, Emilio Serrano

TL;DR

This paper defines the problem of new node prediction, a zero-shot out-of-graph all-links prediction task where a brand-new node has no prior connections. It proposes a Graph Neural Network–based architecture consisting of a GNN for node embeddings, an example generator, and an MLP-based binary predictor that scores candidate neighbor sets. Experiments on synthetic graphs and a subset of ogbl-citation2 demonstrate that the approach can predict a large fraction of links for new nodes, with performance influenced by graph type, purity settings, and whether auxiliary node features are present. The work highlights practical implications, including a possible bibliography recommendation use-case, and discusses how noisy training data can improve generalization, while outlining directions for future work such as more graphs and transformer-style architectures.

Abstract

This paper introduces a new problem in the field of graph mining and social network analysis called new node prediction. More technically, the task can be categorized as zero-shot out-of-graph all-links prediction. This challenging problem aims to predict all links from a new, isolated, and unobserved node that was previously disconnected from the graph. Unlike classic approaches to link prediction (including few-shot out-of-graph link prediction), this problem presents two key differences: (1) the new node has no existing links from which to extract patterns for new predictions; and (2) the goal is to predict not just one, but all the links of this new node, or at least a significant part of them. Experiments demonstrate that an architecture based on Deep Graph Neural Networks can learn to solve this challenging problem in a bibliographic citation network.

Introducing New Node Prediction in Graph Mining: Predicting All Links from Isolated Nodes with Graph Neural Networks

TL;DR

This paper defines the problem of new node prediction, a zero-shot out-of-graph all-links prediction task where a brand-new node has no prior connections. It proposes a Graph Neural Network–based architecture consisting of a GNN for node embeddings, an example generator, and an MLP-based binary predictor that scores candidate neighbor sets. Experiments on synthetic graphs and a subset of ogbl-citation2 demonstrate that the approach can predict a large fraction of links for new nodes, with performance influenced by graph type, purity settings, and whether auxiliary node features are present. The work highlights practical implications, including a possible bibliography recommendation use-case, and discusses how noisy training data can improve generalization, while outlining directions for future work such as more graphs and transformer-style architectures.

Abstract

This paper introduces a new problem in the field of graph mining and social network analysis called new node prediction. More technically, the task can be categorized as zero-shot out-of-graph all-links prediction. This challenging problem aims to predict all links from a new, isolated, and unobserved node that was previously disconnected from the graph. Unlike classic approaches to link prediction (including few-shot out-of-graph link prediction), this problem presents two key differences: (1) the new node has no existing links from which to extract patterns for new predictions; and (2) the goal is to predict not just one, but all the links of this new node, or at least a significant part of them. Experiments demonstrate that an architecture based on Deep Graph Neural Networks can learn to solve this challenging problem in a bibliographic citation network.
Paper Structure (17 sections, 2 equations, 2 figures, 7 tables)

This paper contains 17 sections, 2 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: (a) A tiny graph $G$ with 8 nodes; (b) the training portion $G_{\mathit{train}}$ of it: $\{n_1,n_2,n_3,n_4,n_5,n_6\}$ are the training nodes, whereas $\{n_7,n_8\}$ are the test nodes; (c) the graph $G_{\mathit{train}}^-$ of negative edges; (d) some (positive or negative) training examples with $\mathit{minPure} = 0.7$ and $\mathit{maxSpurious} = 0.5$ (target nodes are boxed; spurious edges are dotted).
  • Figure 2: A Deep Learning architecture for the new node prediction problem.

Theorems & Definitions (2)

  • Example 1
  • Example 2