Table of Contents
Fetching ...

Provably Powerful Graph Neural Networks for Directed Multigraphs

Béni Egressy, Luc von Niederhäusern, Jovan Blanusa, Erik Altman, Roger Wattenhofer, Kubilay Atasu

TL;DR

This paper tackles the challenge of learning on directed multigraphs by proposing three simple adaptations—reverse message passing, directed multigraph port numbering, and ego IDs—that transform standard GNNs into provably powerful directed multigraph learners. The authors prove that, when combined, these adaptations enable a GNN to identify any directed subgraph pattern, and they validate the theory with synthetic pattern-detection tasks and practical financial-crime datasets (money laundering and phishing). Empirically, the adaptations yield large gains in minority-class F1 scores over baselines, with reverse MP driving accuracy on degree-out and port numbering enabling fan-in/out detection, while ego IDs contribute in select complex patterns. The approach also shows promise on real-world benchmarks, offering a scalable, extensible framework for directed multigraph analytics beyond finance, with avenues for future work in broader domains and complexity-aware trade-offs.

Abstract

This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.

Provably Powerful Graph Neural Networks for Directed Multigraphs

TL;DR

This paper tackles the challenge of learning on directed multigraphs by proposing three simple adaptations—reverse message passing, directed multigraph port numbering, and ego IDs—that transform standard GNNs into provably powerful directed multigraph learners. The authors prove that, when combined, these adaptations enable a GNN to identify any directed subgraph pattern, and they validate the theory with synthetic pattern-detection tasks and practical financial-crime datasets (money laundering and phishing). Empirically, the adaptations yield large gains in minority-class F1 scores over baselines, with reverse MP driving accuracy on degree-out and port numbering enabling fan-in/out detection, while ego IDs contribute in select complex patterns. The approach also shows promise on real-world benchmarks, offering a scalable, extensible framework for directed multigraph analytics beyond finance, with avenues for future work in broader domains and complexity-aware trade-offs.

Abstract

This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.
Paper Structure (46 sections, 7 theorems, 4 equations, 9 figures, 14 tables, 2 algorithms)

This paper contains 46 sections, 7 theorems, 4 equations, 9 figures, 14 tables, 2 algorithms.

Key Result

Proposition 4.1

An MPNN with sum aggregation and reverse MP can solve degree-out.

Figures (9)

  • Figure 1: Money Laundering Patterns. The gray fill indicates the nodes to be detected by the synthetic pattern detection tasks. The exact degree/fan pattern sizes here are for illustrative purposes only.
  • Figure 2: Nodes ($a$ and $b$) with different out-degrees are not distinguishable by a standard MPNN with directed message passing. Note that naive bidirectional message passing, on the other hand, is unable to distinguish nodes $a$ and $d$.
  • Figure 3: Nodes (in gray) with different fan-ins that are not distinguishable by a standard MPNN. The edge labels indicate incoming and outgoing port numbers, respectively.
  • Figure 4: Example of money laundering in a network of financial transactions. Alice and Bob stay at a hotel, which is run by a criminal group headed by Tim. Sean has some dirty money (red dollars) from criminal activities that he wants to transfer to Tim. They use the hotel for laundering the money. They mix the dirty cash with clean money from guests, pay different contractors for supplies, and then transfer these payments to Tim. The money transfer from Sean to the hotel is a cash payment hidden from banks and financial authorities (dotted edge). However, the scatter-gather pattern (bold, red edges) could be revealing of a money laundering scheme.
  • Figure 5: sato2019approximation_port_numbers attach only the local port numbers to received messages, as indicated on the left. Port numbers are indicated in blue. For example, node $c$ attaches port number $1$ to all messages from $d$. So in particular, nodes $a$, $b$, and $c$ can never be distinguished. We use the port numbers on either side on an edge as edge features, so both port numbers are seen by both incident nodes. In this example this might not be a problem --- $a$, $b$, and $c$ likely have the same ground truth labels --- but this means that unique node IDs cannot be generated/propagated.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Proposition 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Theorem 4.4
  • Corollary 4.4.1
  • Theorem 4.5
  • Corollary 4.5.1
  • proof
  • proof
  • proof