Table of Contents
Fetching ...

A Data Efficient Framework for Learning Local Heuristics

Rishi Veerapaneni, Jonathan Park, Muhammad Suhail Saleem, Maxim Likhachev

TL;DR

This work tackles the data-intensive nature of learning local heuristic residuals for A* by introducing DE-LoHA*, a backtracking-based framework that collects LH data during a single global A* run rather than via multiple local searches. It formalizes the local residual as $h_k(s) = h_{gk}(s) - h_g(s)$ with region definitions $LR(s)$ and $LRB(s)$, and shows how a global-local ordering consistency allows efficient data collection by backtracking to identify best border states. DE-LoHA* yields about 10x data efficiency and enables online learning by updating the LH model during test-time LoHA* as start-goal problems are solved, demonstrated in a 4D navigation domain. The results indicate practical benefits for online adaptation and suggest applicability to other learning-from-search settings, with robustness enhanced by downweighting incomplete data points.

Abstract

With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset of target values by querying an oracle on many planning problems (in this case, local planning problems). This data collection process can become slow as the size of the local region increases or if the domain requires expensive collision checks. Our main insight is that when an A* search solves a start-goal planning problem it inherently ends up solving multiple local planning problems. We exploit this observation to propose an efficient data collection framework that does <1/10th the amount of work (measured by expansions) to collect the same amount of data in comparison to baselines. This idea also enables us to run LoHA* in an online manner where we can iteratively collect data and improve our model while solving relevant start-goal tasks. We demonstrate the performance of our data collection and online framework on a 4D $(x, y, θ, v)$ navigation domain.

A Data Efficient Framework for Learning Local Heuristics

TL;DR

This work tackles the data-intensive nature of learning local heuristic residuals for A* by introducing DE-LoHA*, a backtracking-based framework that collects LH data during a single global A* run rather than via multiple local searches. It formalizes the local residual as with region definitions and , and shows how a global-local ordering consistency allows efficient data collection by backtracking to identify best border states. DE-LoHA* yields about 10x data efficiency and enables online learning by updating the LH model during test-time LoHA* as start-goal problems are solved, demonstrated in a 4D navigation domain. The results indicate practical benefits for online adaptation and suggest applicability to other learning-from-search settings, with robustness enhanced by downweighting incomplete data points.

Abstract

With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset of target values by querying an oracle on many planning problems (in this case, local planning problems). This data collection process can become slow as the size of the local region increases or if the domain requires expensive collision checks. Our main insight is that when an A* search solves a start-goal planning problem it inherently ends up solving multiple local planning problems. We exploit this observation to propose an efficient data collection framework that does <1/10th the amount of work (measured by expansions) to collect the same amount of data in comparison to baselines. This idea also enables us to run LoHA* in an online manner where we can iteratively collect data and improve our model while solving relevant start-goal tasks. We demonstrate the performance of our data collection and online framework on a 4D navigation domain.
Paper Structure (8 sections, 1 theorem, 1 equation, 3 figures, 1 algorithm)

This paper contains 8 sections, 1 theorem, 1 equation, 3 figures, 1 algorithm.

Key Result

Theorem 1

A local A* using priority $b(s,s')$ and a global A* using $b(s_{start},s')$ will sort states originating from $s$ in $LR(s)$ identically.

Figures (3)

  • Figure 1: Figure borrowed from LoHA* localHeuristic. Instead of estimating the entire cost-to-go from state $s$ (red diamond) to the goal (orange in left), LoHA* computes the residual cost for $s$ to reach a border region (red box in left, zoomed in on right). This avoids local minima when used during search.
  • Figure 2: A simplified example of global A* collecting data for local regions with $K=3$, i.e. for $s=(x,y)$ we want to reach a state $(x',y')$ with $x'$ or $y'$ 3 away. Each (i) depicts the $i^{th}$ state expanded with successor $\rightarrow$ parent denoted. Expanding (2), (3) leads to incomplete data $D_{ic}$ for $s_1, s_2$ as we made some progress but did not reach a $LRB$ yet. Expanding $s_4$ and backtracking reveals that $s_4 \in LRB(s_1)$; we have found $LH(s_1)$ and add it to our complete dataset $D_c$. We continue to expand a node (purple), backtrack (blue arrows) to update values of ancestors (blue) whose $LH(s')$ have not been computed. Bottom right: After 10 expansions, we collect 4 complete (green) and 2 partial (blue) LH values, and cannot collect data for leaf nodes (red).
  • Figure 3: (a) compares collecting data via a ground truth oracle (Local A*) against our "Complete" and "Incomplete" collection methods. (b) plots the "Speed-up" (as measured in nodes expanded) of LoHA* trained on datasets gathered from Local A* calls (true oracle) or our datasets from running a global A* on start-goal problems. (c) We run DE-LoHA* by solving start-goal problems (and collecting data while doing so), and retraining every 5 problems. We see that DE-LoHA* can improve performance from just solving start-goal problems without needing an external oracle after the initial 5 problems.

Theorems & Definitions (2)

  • Theorem 1: Global-Local Ordering Consistency
  • proof