Table of Contents
Fetching ...

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

Geri Skenderi, Hang Li, Jiliang Tang, Marco Cristani

TL;DR

Graph-JEPA introduces a graph-level Joint-Embedding Predictive Architecture that learns semantic representations by predicting latent embeddings of masked subgraphs from a context subgraph, operating entirely in latent space without negative samples. The method partitions graphs into subgraphs, encodes them with GNNs, and uses a simple predictor to locate subgraph targets on a 2D unit hyperbola, thereby inducing a hierarchical, hyperbolic latent structure. Empirical results on diverse graph-classification and regression tasks show competitive or state-of-the-art performance with favorable training efficiency, and ablations demonstrate the benefits of hyperbolic latent prediction, RWSE positional embeddings, and structured partitioning. The work highlights the practicality and effectiveness of latent self-predictive SSL for graphs and points to future extensions to node/edge-level tasks and theoretical analysis of latent geometry.

Abstract

Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal y from the latent representation of a context signal x. JEPAs bypass the need for negative and positive samples, traditionally required by contrastive learning while avoiding the overfitting issues associated with generative pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm by proposing a Graph Joint-Embedding Predictive Architecture (Graph-JEPA). In particular, we employ masked modeling and focus on predicting the latent representations of masked subgraphs starting from the latent representation of a context subgraph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative prediction objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Through multiple experimental evaluations, we show that Graph-JEPA can learn highly semantic and expressive representations, as shown by the downstream performance in graph classification, regression, and distinguishing non-isomorphic graphs. The code is available at https://github.com/geriskenderi/graph-jepa.

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

TL;DR

Graph-JEPA introduces a graph-level Joint-Embedding Predictive Architecture that learns semantic representations by predicting latent embeddings of masked subgraphs from a context subgraph, operating entirely in latent space without negative samples. The method partitions graphs into subgraphs, encodes them with GNNs, and uses a simple predictor to locate subgraph targets on a 2D unit hyperbola, thereby inducing a hierarchical, hyperbolic latent structure. Empirical results on diverse graph-classification and regression tasks show competitive or state-of-the-art performance with favorable training efficiency, and ablations demonstrate the benefits of hyperbolic latent prediction, RWSE positional embeddings, and structured partitioning. The work highlights the practicality and effectiveness of latent self-predictive SSL for graphs and points to future extensions to node/edge-level tasks and theoretical analysis of latent geometry.

Abstract

Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal y from the latent representation of a context signal x. JEPAs bypass the need for negative and positive samples, traditionally required by contrastive learning while avoiding the overfitting issues associated with generative pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm by proposing a Graph Joint-Embedding Predictive Architecture (Graph-JEPA). In particular, we employ masked modeling and focus on predicting the latent representations of masked subgraphs starting from the latent representation of a context subgraph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative prediction objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Through multiple experimental evaluations, we show that Graph-JEPA can learn highly semantic and expressive representations, as shown by the downstream performance in graph classification, regression, and distinguishing non-isomorphic graphs. The code is available at https://github.com/geriskenderi/graph-jepa.
Paper Structure (27 sections, 9 equations, 4 figures, 9 tables)

This paper contains 27 sections, 9 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Illustration of the SSL approaches discussed in this paper: (a) Joint-Embedding (Contrastive) Architectures learn to create similar embeddings for inputs x and y that are compatible with each other and dissimilar embeddings otherwise. This compatibility is implemented in practice by creating different views of the input data. (b) Generative Architectures reconstruct a signal $y$ from an input signal $x$ by conditioning the decoder network on additional (potentially latent) variables $z$. (c) Joint-Embedding Predictive Architectures act as a bridge: They utilize a predictor network that processes the context $x$ and is conditioned on additional (potentially latent) variables to predict the embedding of the target $y$in latent space.
  • Figure 2: A complete overview of Graph-JEPA. We first extract non-overlapping subgraphs (patches) (a.), perform a 1-hop neighborhood expansion (b.), and encode the subgraphs with a GNN to learn feature vectors for the context and target (c.). Afterward, the context and target feature vectors are fed into their respective encoders (d.). The embeddings generated from the target encoder produce the target subgraphs hyperbolic coordinates $\psi_y$. On the other hand, the encoded context is fed into a predictor network, which is also conditioned on the positional embedding of the target subgraph, to then predict the coordinates $\hat{\psi}_y$ for the target subgraph (e.). A regression loss $D$, based on the distance in latent space, acts as the learning objective (f.). Note that the extracted subgraphs in (a.) and (b.) are meant for illustrative purposes only, as in practice, we use multiple target subgraphs for a given context. Furthermore, the number of nodes in each subgraph can vary.
  • Figure 3: 3D t-SNEvan2008visualizing of the latent representations used to train the linear classifier on the DD dataset. The change in the curvature of the embedding using the Graph-JEPA objective (b.) is noticeable. Best viewed in color.
  • Figure 4: Visualization of the partition of a small graph from the MUTAG dataset used as input (a.) and the corresponding (learned) target embeddings in the hyperbolic plane by Graph-JEPA (b.) Best viewed in color.