Table of Contents
Fetching ...

DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

Zhaoru Ke, Hang Yu, Jianguo Li, Haipeng Zhang

TL;DR

DUPLEX tackles the challenge of embedding directed graphs by using a Hermitian adjacency matrix to encode both connectivity and direction, paired with a dual GAT encoder that separately learns amplitude and phase components. The model employs two parameter-free decoders to reconstruct the HAM in a self-supervised fashion, enabling task-agnostic embeddings that generalize to unseen nodes. Empirical results across five digraph datasets show strong improvements, particularly for low-degree nodes, and demonstrate robustness in both inductive and transductive settings, as well as across multiple downstream tasks. The approach offers a scalable, inductive alternative to spectral methods, with broad practical impact for directed graph analysis and downstream predictive tasks.

Abstract

Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly coupled with specific tasks. In response, we propose DUPLEX, an inductive framework for complex embeddings of directed graphs. It (1) leverages Hermitian adjacency matrix decomposition for comprehensive neighbor integration, (2) employs a dual GAT encoder for directional neighbor modeling, and (3) features two parameter-free decoders to decouple training from particular tasks. DUPLEX outperforms state-of-the-art models, especially for nodes with sparse connectivity, and demonstrates robust inductive capability and adaptability across various tasks. The code is available at https://github.com/alipay/DUPLEX.

DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

TL;DR

DUPLEX tackles the challenge of embedding directed graphs by using a Hermitian adjacency matrix to encode both connectivity and direction, paired with a dual GAT encoder that separately learns amplitude and phase components. The model employs two parameter-free decoders to reconstruct the HAM in a self-supervised fashion, enabling task-agnostic embeddings that generalize to unseen nodes. Empirical results across five digraph datasets show strong improvements, particularly for low-degree nodes, and demonstrate robustness in both inductive and transductive settings, as well as across multiple downstream tasks. The approach offers a scalable, inductive alternative to spectral methods, with broad practical impact for directed graph analysis and downstream predictive tasks.

Abstract

Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly coupled with specific tasks. In response, we propose DUPLEX, an inductive framework for complex embeddings of directed graphs. It (1) leverages Hermitian adjacency matrix decomposition for comprehensive neighbor integration, (2) employs a dual GAT encoder for directional neighbor modeling, and (3) features two parameter-free decoders to decouple training from particular tasks. DUPLEX outperforms state-of-the-art models, especially for nodes with sparse connectivity, and demonstrates robust inductive capability and adaptability across various tasks. The code is available at https://github.com/alipay/DUPLEX.
Paper Structure (38 sections, 2 theorems, 23 equations, 8 figures, 14 tables)

This paper contains 38 sections, 2 theorems, 23 equations, 8 figures, 14 tables.

Key Result

Lemma 2.1

For matrices ${\bm{A}}\in {\mathbb{R}}^{m\times n}$ and ${\bm{B}}\in {\mathbb{R}}^{n\times s}$ with ${\bm{A}}{\bm{B}}={\bm{0}}$, it holds that $r({\bm{A}}) + r({\bm{B}})\leq n$.

Figures (8)

  • Figure 1: (a) Many nodes in real digraphs have zero in/out-degree. (b) Separating in/out-degree yields more low-degree nodes compared to the total degree that disregards direction. (c) Lack of in-neighbors hinders the dual embedding methods in capturing node $c$'s source role.
  • Figure 2: The architecture of DUPLEX. (a) Forward pass and backward pass of the model. (b) The undirected ($\oplus$) and directed ($\mathrel{\mathop{\oplus}\limits^{ \hbox{\ex@ $\rightharpoonup$}}}$) graph aggregator. (c) The main idea of the direction-aware decoder.
  • Figure 3: Link existence prediction AUC (%) under extremely low-degree setting.
  • Figure 4: (a) Loss curve. (b) Mean square error between approximated HAM and the ground truth.
  • Figure 5: Node classification macro $F_1$ (%) and micro $F_1$ (%) with self-supervised training on randomly initialized graphs. The 'early' represent for the 'early-fusion' strategy, 'mid' for 'mid-fusion', 'late' for 'late-fusion', 'all' for 'all-fusion'. The dashed line is the baseline result with no fusion layer.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Lemma 2.1: Sylvester's rank inequality
  • Lemma 2.2
  • proof