Table of Contents
Fetching ...

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

Forest Agostinelli, Shahaf S. Shperberg, Alexander Shmakov, Stephen McAleer, Roy Fox, Pierre Baldi

TL;DR

This work introduces Q*, a search algorithm that leverages heuristics capable of receiving a state and, in a single function call, returning cost-to-go estimates for all possible transitions from that state -- without the need to apply the transitions or generate the successor states; such action-state estimation are typically known as Q-values.

Abstract

Efficiently solving problems with large action spaces using A* search remains a significant challenge. This is because, for each iteration of A* search, the number of nodes generated and the number of heuristic function applications grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this issue, we introduce Q*, a search algorithm that leverages heuristics capable of receiving a state and, in a single function call, returning cost-to-go estimates for all possible transitions from that state, along with estimates of the corresponding transition costs -- without the need to apply the transitions or generate the successor states; such action-state estimation are typically known as Q-values. This significantly reduces computation time and memory usage. In addition, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that does not overestimate the sum of the transition cost and cost-to-go of the state. To obtain heuristics for Q* search, we employ a deep Q-network architecture to learn a state-action heuristic function from domain interaction, without any prior knowledge. We use Q* with our learned heuristic on different domains and action spaces, showing that Q* suffers from only a small runtime overhead as the size of the action space increases. In addition, our empirical results show Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search.

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

TL;DR

This work introduces Q*, a search algorithm that leverages heuristics capable of receiving a state and, in a single function call, returning cost-to-go estimates for all possible transitions from that state -- without the need to apply the transitions or generate the successor states; such action-state estimation are typically known as Q-values.

Abstract

Efficiently solving problems with large action spaces using A* search remains a significant challenge. This is because, for each iteration of A* search, the number of nodes generated and the number of heuristic function applications grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this issue, we introduce Q*, a search algorithm that leverages heuristics capable of receiving a state and, in a single function call, returning cost-to-go estimates for all possible transitions from that state, along with estimates of the corresponding transition costs -- without the need to apply the transitions or generate the successor states; such action-state estimation are typically known as Q-values. This significantly reduces computation time and memory usage. In addition, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that does not overestimate the sum of the transition cost and cost-to-go of the state. To obtain heuristics for Q* search, we employ a deep Q-network architecture to learn a state-action heuristic function from domain interaction, without any prior knowledge. We use Q* with our learned heuristic on different domains and action spaces, showing that Q* suffers from only a small runtime overhead as the size of the action space increases. In addition, our empirical results show Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search.

Paper Structure

This paper contains 18 sections, 2 theorems, 14 equations, 7 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

As long as BWQS did not terminate, either there exists a node in OPEN corresponding to a prefix of some shortest path from $\mathit{start}$ to $\mathit{goal}$, or a shortest path from $\mathit{start}$ to $\mathit{goal}$ was discovered.

Figures (7)

  • Figure 1: Example demonstrating the node generations and heuristic calls by each algorithm.
  • Figure 2: Comparison of DAVI and Q-learning architectures. Both share the same backbone but differ in their output layers.
  • Figure 3: Relationship between the average path cost and the average time to find a solution.
  • Figure 4: Relationship between the average path cost and the average node generations.
  • Figure 5: Action space size ablation study on Rubik's cube: average path cost vs average time to find a solution.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Theorem 1
  • proof