LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning
Xunkai Li, Meihao Liao, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang
TL;DR
LightDiC addresses the challenge of scalable representation learning on large-scale directed graphs by decoupling graph-structure processing from learning and leveraging a magnetic Laplacian-based operator to encode directionality. It introduces a three-part pipeline—offline magnetic-graph operator construction, weight-free $K$-step complex-domain feature smoothing, and a lightweight linear predictor on concatenated real and imaginary components—grounded in proximal-gradient and Dirichlet-energy theory. The approach yields a low-cost, high-signal solution that matches or exceeds state-of-the-art baselines on seven digraph datasets, including the billion-scale ogbn-papers100M, while using far fewer parameters and achieving substantial training speedups. By delivering a simple yet principled framework with strong theoretical backing, LightDiC enables practical, scalable deployment of DiGNNs on real-world directed networks.
Abstract
Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).
