Table of Contents
Fetching ...

Vejde: A Framework for Inductive Deep Reinforcement Learning Based on Factor Graph Color Refinement

Jakob Nyberg, Pontus Johnson

TL;DR

Vejde presents a framework that merges graph-based relational representations with model-free reinforcement learning to produce inductive policies for structured MDPs. By encoding states as facts, converting them into bipartite graphs, and applying mpnn color refinement, Vejde yields latent embeddings from which actions and state values are computed via an inductive policy and value function. Through imitation learning against Prost and PPO-based RL across eight RDDL domains, the approach demonstrates generalization to unseen instances and competitive performance with instance-specific baselines, while sometimes underperforming Prost due to planning costs and exploration challenges. The work highlights the practical potential of structure-aware neural policies in relational domains and provides a Python library to facilitate applying these ideas to new problem areas, especially where data is naturally relational. Overall, Vejde offers a scalable, inductive alternative to fully instance-specific models and deep planning when deploying agents over varied, complex relational environments.

Abstract

We present and evaluate Vejde; a framework which combines data abstraction, graph neural networks and reinforcement learning to produce inductive policy functions for decision problems with richly structured states, such as object classes and relations. MDP states are represented as data bases of facts about entities, and Vejde converts each state to a bipartite graph, which is mapped to latent states through neural message passing. The factored representation of both states and actions allows Vejde agents to handle problems of varying size and structure. We tested Vejde agents on eight problem domains defined in RDDL, with ten problem instances each, where policies were trained using both supervised and reinforcement learning. To test policy generalization, we separate problem instances in two sets, one for training and the other solely for testing. Test results on unseen instances for the Vejde agents were compared to MLP agents trained on each problem instance, as well as the online planning algorithm Prost. Our results show that Vejde policies in average generalize to the test instances without a significant loss in score. Additionally, the inductive agents received scores on unseen test instances that on average were close to the instance-specific MLP agents.

Vejde: A Framework for Inductive Deep Reinforcement Learning Based on Factor Graph Color Refinement

TL;DR

Vejde presents a framework that merges graph-based relational representations with model-free reinforcement learning to produce inductive policies for structured MDPs. By encoding states as facts, converting them into bipartite graphs, and applying mpnn color refinement, Vejde yields latent embeddings from which actions and state values are computed via an inductive policy and value function. Through imitation learning against Prost and PPO-based RL across eight RDDL domains, the approach demonstrates generalization to unseen instances and competitive performance with instance-specific baselines, while sometimes underperforming Prost due to planning costs and exploration challenges. The work highlights the practical potential of structure-aware neural policies in relational domains and provides a Python library to facilitate applying these ideas to new problem areas, especially where data is naturally relational. Overall, Vejde offers a scalable, inductive alternative to fully instance-specific models and deep planning when deploying agents over varied, complex relational environments.

Abstract

We present and evaluate Vejde; a framework which combines data abstraction, graph neural networks and reinforcement learning to produce inductive policy functions for decision problems with richly structured states, such as object classes and relations. MDP states are represented as data bases of facts about entities, and Vejde converts each state to a bipartite graph, which is mapped to latent states through neural message passing. The factored representation of both states and actions allows Vejde agents to handle problems of varying size and structure. We tested Vejde agents on eight problem domains defined in RDDL, with ten problem instances each, where policies were trained using both supervised and reinforcement learning. To test policy generalization, we separate problem instances in two sets, one for training and the other solely for testing. Test results on unseen instances for the Vejde agents were compared to MLP agents trained on each problem instance, as well as the online planning algorithm Prost. Our results show that Vejde policies in average generalize to the test instances without a significant loss in score. Additionally, the inductive agents received scores on unseen test instances that on average were close to the instance-specific MLP agents.

Paper Structure

This paper contains 62 sections, 8 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Renderings of factored state and action representations.
  • Figure 2: Null distribution from permutation test between average scores of Vejde and mlp agent on all problems. $p$-value is calculated as $P(X>0.0495)\cdot 2$.
  • Figure 3: Null distribution from permutation test between average scores on test and train set problems for imitation learning agents. $p$-value is calculated as $P(X>0.0086)\cdot 2$.
  • Figure 4: Box plots of normalized scores of Vejde agents, mlp agents and Prost for each domain.
  • Figure 5: A bar chart showing training times in hours for message passing agents and graph attention agents on each domain. Times were averaged over five runs.
  • ...and 1 more figures