Table of Contents
Fetching ...

LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning

Xunkai Li, Meihao Liao, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang

TL;DR

LightDiC addresses the challenge of scalable representation learning on large-scale directed graphs by decoupling graph-structure processing from learning and leveraging a magnetic Laplacian-based operator to encode directionality. It introduces a three-part pipeline—offline magnetic-graph operator construction, weight-free $K$-step complex-domain feature smoothing, and a lightweight linear predictor on concatenated real and imaginary components—grounded in proximal-gradient and Dirichlet-energy theory. The approach yields a low-cost, high-signal solution that matches or exceeds state-of-the-art baselines on seven digraph datasets, including the billion-scale ogbn-papers100M, while using far fewer parameters and achieving substantial training speedups. By delivering a simple yet principled framework with strong theoretical backing, LightDiC enables practical, scalable deployment of DiGNNs on real-world directed networks.

Abstract

Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).

LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning

TL;DR

LightDiC addresses the challenge of scalable representation learning on large-scale directed graphs by decoupling graph-structure processing from learning and leveraging a magnetic Laplacian-based operator to encode directionality. It introduces a three-part pipeline—offline magnetic-graph operator construction, weight-free -step complex-domain feature smoothing, and a lightweight linear predictor on concatenated real and imaginary components—grounded in proximal-gradient and Dirichlet-energy theory. The approach yields a low-cost, high-signal solution that matches or exceeds state-of-the-art baselines on seven digraph datasets, including the billion-scale ogbn-papers100M, while using far fewer parameters and achieving substantial training speedups. By delivering a simple yet principled framework with strong theoretical backing, LightDiC enables practical, scalable deployment of DiGNNs on real-world directed networks.

Abstract

Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).
Paper Structure (24 sections, 5 theorems, 24 equations, 5 figures, 8 tables)

This paper contains 24 sections, 5 theorems, 24 equations, 5 figures, 8 tables.

Key Result

Lemma 1

The total variation of the digraph signal $\mathbf{X}$ is a smoothness measure, quantifying how much the signal $\mathbf{X}$ changes with respect to the digraph topology encoded in magnetic Laplacian $\mathbf{L}_m$ as the following quadratic form, which is also known as Dirichlet energy

Figures (5)

  • Figure 1: Overview of our proposed LightDiC, including Step 1: predefined magnetic graph operator based on asymmetric digraph adjacency matrix, Step 2: feature pre-processing, and Step 3: model training with processed features.
  • Figure 2: Convergence curves with the relative training time on Epinions and WikiTalk datasets. The shaded area is the result range of 10 runs.
  • Figure 3: Performance of magnetic Laplacian-based DiGNNs.
  • Figure 4: Node-C performance on CoraML under sparsity settings.
  • Figure 5: Illustration of all the eigenvectors of a magnetic Laplacian $\mathbf{L}^{(q)}_m=\mathbf{D}_m-\mathbf{A}_m \odot \exp \left(i \Theta^{(q)}\right)$ with $q=0.25$ of an example graph $\mathcal{G}$. The eigenvectors corresponding to smaller eigenvalues vary more smoothly on graphs. When evaluating the smoothness of a signal $\mathbf{X}$ as $\mathbf{X}^\dagger\mathcal{L}\mathbf{X}$, the smoothness of $\mathbf{u}_1=[0.51e^{i0.31\pi},0.54e^{i0.34\pi},0.49e^{-i0.09\pi},0.46e^{-i0.39\pi}]^{\operatorname{T}}$ is $\lambda_1=0.11$, followed by the smoothness of $\mathbf{u}_2$ ($0.93$), $\mathbf{u}_3$ ($2.07$) and $\mathbf{u}_4$ ($2.89$).

Theorems & Definitions (7)

  • Definition 1
  • Lemma 1
  • Definition 2
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5