Table of Contents
Fetching ...

On Using Admissible Bounds for Learning Forward Search Heuristics

Carlos Núñez-Molina, Masataro Asai, Pablo Mesejo, Juan Fernández-Olivares

TL;DR

This paper addresses the theoretical gap in learning forward search heuristics by advocating a distributional view that leverages admissible bounds. It models the learned cost-to-go with a Truncated Gaussian ${\mathcal{T}}{\mathcal{N}}(x|\mu,\sigma,l,u)$, where the lower bound $l$ is provided by an admissible heuristic, and derives a corresponding negative log-likelihood loss that differs from traditional MSE. Through experiments across four classical planning domains using Neural Logic Machines, STRIPS-HGN, and Linear Regression, the authors demonstrate faster convergence and improved heuristic quality when using the truncated model, especially when learning the distributional parameter $\sigma$ and employing residual learning. The approach yields better planning performance in Greedy Best-First Search compared to baselines like $h_{FF}$, illustrating the practical impact of incorporating admissible bounds into learning. The work lays groundwork for future extensions to iterative search with upper bounds and reinforcement learning settings.

Abstract

In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics.

On Using Admissible Bounds for Learning Forward Search Heuristics

TL;DR

This paper addresses the theoretical gap in learning forward search heuristics by advocating a distributional view that leverages admissible bounds. It models the learned cost-to-go with a Truncated Gaussian , where the lower bound is provided by an admissible heuristic, and derives a corresponding negative log-likelihood loss that differs from traditional MSE. Through experiments across four classical planning domains using Neural Logic Machines, STRIPS-HGN, and Linear Regression, the authors demonstrate faster convergence and improved heuristic quality when using the truncated model, especially when learning the distributional parameter and employing residual learning. The approach yields better planning performance in Greedy Best-First Search compared to baselines like , illustrating the practical impact of incorporating admissible bounds into learning. The work lays groundwork for future extensions to iterative search with upper bounds and reinforcement learning settings.

Abstract

In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics.
Paper Structure (38 sections, 19 equations, 2 figures, 14 tables)

This paper contains 38 sections, 19 equations, 2 figures, 14 tables.

Figures (2)

  • Figure 1: The probability density functions (PDFs) of Truncated Gaussian distributions $p({\textnormal{x}})={\mathcal{T}}{\mathcal{N}}(\mu=0,\sigma=1,l,u)$ with several lower/upper bounds $(l,u)$. In the heuristic learning setting, ${\textnormal{x}}$ is the optimal solution cost $h^*$ sampled from the dataset and $\mu=\mu_{\theta}(s)$ is the prediction associated with a state $s$. The $(l,u)=(0.2,1.7)$ variant (yellow) shows that the mean ${\mathbb{E}}_{p({\textnormal{x}})}[{\textnormal{x}}]$, which we use as the search heuristic, respects the bounds $(l,u)$ even when the predicted $\mu=0$ lies outside $(l,u)$.
  • Figure 2: Comparison of the training curve ($x$-axis: training step) for the validation MSE loss ($y$-axis, logarithmic) between Gaussian (orange) and Truncated Gaussian (blue) models, independent runs recorded on 5 random seeds each. The losses converge faster for the latter due to the additional information provided by the admissible lower bound $l= h\mathrm{LMcut}\xspace \text{LMcut}\xspace$.