Table of Contents
Fetching ...

Improving Graph Neural Networks on Multi-node Tasks with the Labeling Trick

Xiyuan Wang, Pan Li, Muhan Zhang

TL;DR

This work identifies a fundamental limitation in using standard GNNs to learn multi-node representations: independently learned node embeddings fail to capture dependencies within a target node set. It introduces labeling tricks—set labeling, poset labeling, and subset labeling—and extends them to hypergraphs to enable expressive, structure-aware multi-node representations. The authors prove that, with sufficiently expressive GNNs, labeling tricks empower learning of structural representations for node sets, links, posets, and higher-order substructures, and they demonstrate significant empirical gains across undirected and directed link prediction, hyperedge prediction, and subgraph prediction tasks. The approach provides a theoretical foundation for using GNNs in multi-node learning and offers practical, scalable strategies to boost performance. Overall, labeling tricks reconcile the strengths of node-level GNNs with the needs of multi-node inference, delivering improved expressivity with practical computational considerations.

Abstract

In this paper, we study using graph neural networks (GNNs) for \textit{multi-node representation learning}, where a representation for a set of more than one node (such as a link) is to be learned. Existing GNNs are mainly designed to learn single-node representations. When used for multi-node representation learning, a common practice is to directly aggregate the single-node representations obtained by a GNN. In this paper, we show a fundamental limitation of such an approach, namely the inability to capture the dependence among multiple nodes in the node set. A straightforward solution is to distinguish target nodes from others. Formalizing this idea, we propose \text{labeling trick}, which first labels nodes in the graph according to their relationships with the target node set before applying a GNN and then aggregates node representations obtained in the labeled graph for multi-node representations. Besides node sets in graphs, we also extend labeling tricks to posets, subsets and hypergraphs. Experiments verify that the labeling trick technique can boost GNNs on various tasks, including undirected link prediction, directed link prediction, hyperedge prediction, and subgraph prediction. Our work explains the superior performance of previous node-labeling-based methods and establishes a theoretical foundation for using GNNs for multi-node representation learning.

Improving Graph Neural Networks on Multi-node Tasks with the Labeling Trick

TL;DR

This work identifies a fundamental limitation in using standard GNNs to learn multi-node representations: independently learned node embeddings fail to capture dependencies within a target node set. It introduces labeling tricks—set labeling, poset labeling, and subset labeling—and extends them to hypergraphs to enable expressive, structure-aware multi-node representations. The authors prove that, with sufficiently expressive GNNs, labeling tricks empower learning of structural representations for node sets, links, posets, and higher-order substructures, and they demonstrate significant empirical gains across undirected and directed link prediction, hyperedge prediction, and subgraph prediction tasks. The approach provides a theoretical foundation for using GNNs in multi-node learning and offers practical, scalable strategies to boost performance. Overall, labeling tricks reconcile the strengths of node-level GNNs with the needs of multi-node inference, delivering improved expressivity with practical computational considerations.

Abstract

In this paper, we study using graph neural networks (GNNs) for \textit{multi-node representation learning}, where a representation for a set of more than one node (such as a link) is to be learned. Existing GNNs are mainly designed to learn single-node representations. When used for multi-node representation learning, a common practice is to directly aggregate the single-node representations obtained by a GNN. In this paper, we show a fundamental limitation of such an approach, namely the inability to capture the dependence among multiple nodes in the node set. A straightforward solution is to distinguish target nodes from others. Formalizing this idea, we propose \text{labeling trick}, which first labels nodes in the graph according to their relationships with the target node set before applying a GNN and then aggregates node representations obtained in the labeled graph for multi-node representations. Besides node sets in graphs, we also extend labeling tricks to posets, subsets and hypergraphs. Experiments verify that the labeling trick technique can boost GNNs on various tasks, including undirected link prediction, directed link prediction, hyperedge prediction, and subgraph prediction. Our work explains the superior performance of previous node-labeling-based methods and establishes a theoretical foundation for using GNNs for multi-node representation learning.
Paper Structure (59 sections, 22 theorems, 55 equations, 7 figures, 10 tables)

This paper contains 59 sections, 22 theorems, 55 equations, 7 figures, 10 tables.

Key Result

Proposition 8

(Srinivasan2020On) GAE cannot learn structural multi-node representations no matter how expressive node representations a GNN can learn.

Figures (7)

  • Figure 1: (a) In this graph, nodes $v_2$ and $v_3$ are in the same orbit; links $(v_1,v_2)$ and $(v_4,v_3)$ are isomorphic; link $(v_1,v_2)$ and link $(v_1,v_3)$ are not isomorphic. However, if we aggregate two node representations learned by a GNN as the link representation, we will give $(v_1,v_2)$ and $(v_1,v_3)$ the same prediction. (b) In this graph, nodes $v_3$ and $v_4$ are isomorphic. Aggregating the node embeddings within the subgraph, GNNs will produce equal embeddings for subgraphs $(v_1,v_2,v_3)$ and $(v_1,v_2,v_4)$, while the two subgraphs are not isomorphic. This problem was first observed by you2019position, which was interpret as the failure of GNNs to capture node positions, and later became more formalized in Srinivasan2020On.
  • Figure 2: When predicting $(v_1,v_2)$, we will label these two nodes differently from the rest so that a GNN is aware of the target link when learning $v_1$ and $v_2$'s representations. Similarly, when predicting $(v_1,v_3)$, nodes $v_1$ and $v_3$ will be labeled differently. This way, the representation of $v_2$ in the left graph will be different from that of $v_3$ in the right graph, enabling GNNs to distinguish the non-isomorphic links $(v_1,v_2)$ and $(v_1,v_3)$.
  • Figure 3: Set labeling with Graph Neural Networks (GNNs) fails to distinguish between non-isomorphic directed links, such as the edge from $v_1$ to $v_2$ versus the edge from $v_2$ to $v_3$, because it does not account for the order of nodes within the target node pairs.
  • Figure 4: Different Hasse diagrams
  • Figure 5: An example of when subset labeling trick differentiates two node sets, while set labeling trick does not. First row: labeled graphs. Second row: rooted subtrees of $v$.
  • ...and 2 more figures

Theorems & Definitions (41)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Proposition 8
  • Definition 9
  • Definition 10
  • ...and 31 more