A Data Efficient Framework for Learning Local Heuristics
Rishi Veerapaneni, Jonathan Park, Muhammad Suhail Saleem, Maxim Likhachev
TL;DR
This work tackles the data-intensive nature of learning local heuristic residuals for A* by introducing DE-LoHA*, a backtracking-based framework that collects LH data during a single global A* run rather than via multiple local searches. It formalizes the local residual as $h_k(s) = h_{gk}(s) - h_g(s)$ with region definitions $LR(s)$ and $LRB(s)$, and shows how a global-local ordering consistency allows efficient data collection by backtracking to identify best border states. DE-LoHA* yields about 10x data efficiency and enables online learning by updating the LH model during test-time LoHA* as start-goal problems are solved, demonstrated in a 4D navigation domain. The results indicate practical benefits for online adaptation and suggest applicability to other learning-from-search settings, with robustness enhanced by downweighting incomplete data points.
Abstract
With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset of target values by querying an oracle on many planning problems (in this case, local planning problems). This data collection process can become slow as the size of the local region increases or if the domain requires expensive collision checks. Our main insight is that when an A* search solves a start-goal planning problem it inherently ends up solving multiple local planning problems. We exploit this observation to propose an efficient data collection framework that does <1/10th the amount of work (measured by expansions) to collect the same amount of data in comparison to baselines. This idea also enables us to run LoHA* in an online manner where we can iteratively collect data and improve our model while solving relevant start-goal tasks. We demonstrate the performance of our data collection and online framework on a 4D $(x, y, θ, v)$ navigation domain.
