Table of Contents
Fetching ...

The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search

Forest Agostinelli

Abstract

DeepXube is a free and open-source Python package and command-line tool that seeks to automate the solution of pathfinding problems by using machine learning to learn heuristic functions that guide heuristic search algorithms tailored to deep neural networks (DNNs). DeepXube is comprised of the latest advances in deep reinforcement learning, heuristic search, and formal logic for solving pathfinding problems. This includes limited-horizon Bellman-based learning, hindsight experience replay, batched heuristic search, and specifying goals with answer-set programming. A robust multiple-inheritance structure simplifies the definition of pathfinding domains and the generation of training data. Training heuristic functions is made efficient through the automatic parallelization of the generation of training data across central processing units (CPUs) and reinforcement learning updates across graphics processing units (GPUs). Pathfinding algorithms that take advantage of the parallelism of GPUs and DNN architectures, such as batch weighted A* and Q* search and beam search are easily employed to solve pathfinding problems through command-line arguments. Finally, several convenient features for visualization, code profiling, and progress monitoring during training and solving are available. The GitHub repository is publicly available at https://github.com/forestagostinelli/deepxube.

The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search

Abstract

DeepXube is a free and open-source Python package and command-line tool that seeks to automate the solution of pathfinding problems by using machine learning to learn heuristic functions that guide heuristic search algorithms tailored to deep neural networks (DNNs). DeepXube is comprised of the latest advances in deep reinforcement learning, heuristic search, and formal logic for solving pathfinding problems. This includes limited-horizon Bellman-based learning, hindsight experience replay, batched heuristic search, and specifying goals with answer-set programming. A robust multiple-inheritance structure simplifies the definition of pathfinding domains and the generation of training data. Training heuristic functions is made efficient through the automatic parallelization of the generation of training data across central processing units (CPUs) and reinforcement learning updates across graphics processing units (GPUs). Pathfinding algorithms that take advantage of the parallelism of GPUs and DNN architectures, such as batch weighted A* and Q* search and beam search are easily employed to solve pathfinding problems through command-line arguments. Finally, several convenient features for visualization, code profiling, and progress monitoring during training and solving are available. The GitHub repository is publicly available at https://github.com/forestagostinelli/deepxube.
Paper Structure (29 sections, 4 equations, 9 figures)

This paper contains 29 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: An overview of how DeepXube trains heuristic functions. The user implements the domain, along with its states, actions, and goals, as well as a DNN architecture that processes states, goals, and possibly actions. DeepXube then samples problem instances from the domain and uses the DNN to attempt to solve them with pathfinding, generating nodes and edges from the search tree. A reinforcement learning update is then applied to either the nodes or edges to compute targets. The DNN is then trained to match the targets.
  • Figure 2: The primary mixin classes used to construct domains. Methods in white are abstract and methods in black are implemented by the class. Mixins inherit all functionality of their ancestors.
  • Figure 3: A visualization of the problem instance generation function along with two mixin classes for generating problem instances by either 1) sampling a start state, taking a random walk, and sampling a goal from the terminal state of that random walk; or 2) sampling a goal and a corresponding goal state, taking a random walk in reverse, and using the terminal state of that random walk as the start state.
  • Figure 4: The interaction between states, goals, actions, neural network inputs, and heuristic/policy functions. Heuristic-q functions take an action as an input or output the q-values for all possible actions. A policy function takes an action as an input during training and samples actions during inference.
  • Figure 5: The available pathfinding algorithms. The mixin class that the domain must subclass is in blue and the functions used are in green. If both a heuristic and policy function are used, then the pathfinding algorithm is guided by the heuristic function and the policy function is used to sample actions.
  • ...and 4 more figures