Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning

Jason Piquenot; Maxime Bérar; Pierre Héroux; Jean-Yves Ramel; Romain Raveaux; Sébastien Adam

Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning

Jason Piquenot, Maxime Bérar, Pierre Héroux, Jean-Yves Ramel, Romain Raveaux, Sébastien Adam

TL;DR

This work tackles the problem of discovering efficient matrix-based formulae for counting graph substructures (paths and cycles) by learning within a Context-Free Grammar constrained space. It introduces Grammar Reinforcement Learning (GRL), a deep RL approach that uses Monte Carlo Tree Search over a CFG, implemented via Gramformer, a transformer model that emulates a PushDown Automaton. GRL recovers known Voropaev-style formulae and, crucially, discovers novel, more efficient expressions for path counts up to length six, achieving speedups up to 6.25x. It also adapts the framework to edge-level and directed-graph counting and outlines directions to extend beyond length six by using more expressive k-WL CFGs, with potential impact on scalable graph analytics and interpretability of substructure counting.

Abstract

This paper presents Grammar Reinforcement Learning (GRL), a reinforcement learning algorithm that uses Monte Carlo Tree Search (MCTS) and a transformer architecture that models a Pushdown Automaton (PDA) within a context-free grammar (CFG) framework. Taking as use case the problem of efficiently counting paths and cycles in graphs, a key challenge in network analysis, computer science, biology, and social sciences, GRL discovers new matrix-based formulas for path/cycle counting that improve computational efficiency by factors of two to six w.r.t state-of-the-art approaches. Our contributions include: (i) a framework for generating gramformers that operate within a CFG, (ii) the development of GRL for optimizing formulas within grammatical structures, and (iii) the discovery of novel formulas for graph substructure counting, leading to significant computational improvements.

Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (25 sections, 13 theorems, 71 equations, 14 figures, 13 algorithms)

This paper contains 25 sections, 13 theorems, 71 equations, 14 figures, 13 algorithms.

Introduction
Background
Path and cycle counting
Context-Free Grammar.
PushDown Automaton
Generating path/cycle counting formula through GRL
From path matrix formulae to the CFG $G_3$
From $G_3$ to the PDA $D_3$
Search in $D_3$ through Grammar Reinforcement Learning
Gramformer
Finding more efficient formulae for counting with RL.
Conclusion
CFGs and PDAs
On the evaluation of GRL in the context of path counting
A CFG to count at edge level
...and 10 more sections

Key Result

Theorem 3.1

$G_3$ is as expressive as $3\text{-WL}$

Figures (14)

Figure 1: The left diagram illustrates a path in the derivation tree of the PDA $D_3$ which generates the sentence $J\odot A^2 \in L(G_3)$. The right diagram details the process of generating this sentence, emphasizing the transcription and transposition loops. As depicted, the stack fills during transposition steps and empties during transcription steps, eventually leading to the derivation of a sentence from the language.
Figure 2: From left to right: The agent selects a set of $N$ sentences based on an MCTS heuristic. These sentences are computed for a given set of graphs. The computation is then evaluated against a ground truth, yielding a linear combination of the sentences and a value representing their pertinence. This value is subsequently backpropagated through the MCTS search tree.
Figure 3: In the acting phase, rules are selected based on both the MCTS algorithm and the neural network outputs. Each time MCTS selects a node, the decision, empirical policy, and value of the node are stored in a replay buffer. During the learning phase, the neural network is updated by predicting the policy and value functions based on the decisions stored in the replay buffer.
Figure 4: From PDA to grammar tokens: $D_3$ is turned into three sets of tokens. The corresponding variables of each element of $\delta_r$ are turned into variable tokens. For each variable token, a set of rule tokens is defined. Eventually, for every corresponding terminal symbols of $\delta_w$ a terminal token is defined. In the end, for each variable token, a variable mask is defined.
Figure 5: The input is read until the first variable token (Rd). This token is passed to the encoder (Enc). The decoder (Dec) receives the encoder output and the input. The first output of the decoder is combined with the mask corresponding to variable token to generate a policy. The second output is the value.
...and 9 more figures

Theorems & Definitions (24)

Definition 2.1: Context-Free Grammar
Definition 2.2: Derivation
Definition 2.3: Context-Free Language
Definition 2.4: PushDown Automaton
Theorem 3.1: $3\text{-WL}$ CFG
Theorem 5.1: Efficient path counting
Proposition A.1
proof
Theorem A.1: $3\text{-WL}$ CFG
proof
...and 14 more

Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning

TL;DR

Abstract

Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (24)