Table of Contents
Fetching ...

A Training Data Recipe to Accelerate A* Search with Language Models

Devaansh Gupta, Boyang Li

TL;DR

This work empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task, and finds an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLM need the same set of nodes for effective generalisation.

Abstract

Combining Large Language Models (LLMs) with heuristic search algorithms like A* holds the promise of enhanced LLM reasoning and scalable inference. To accelerate training and reduce computational demands, we investigate the coreset selection problem for the training data of LLM heuristic learning. Few methods to learn the heuristic functions consider the interaction between the search algorithm and the machine learning model. In this work, we empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Surprisingly, we find an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLMs need the same set of nodes for effective generalisation. With these insights, we derive a data-selection distribution for learning LLM-based heuristics. On three classical planning domains, maze navigation, Sokoban and sliding tile puzzles, our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x. The codebase is at https://github.com/devaansh100/a_star.

A Training Data Recipe to Accelerate A* Search with Language Models

TL;DR

This work empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task, and finds an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLM need the same set of nodes for effective generalisation.

Abstract

Combining Large Language Models (LLMs) with heuristic search algorithms like A* holds the promise of enhanced LLM reasoning and scalable inference. To accelerate training and reduce computational demands, we investigate the coreset selection problem for the training data of LLM heuristic learning. Few methods to learn the heuristic functions consider the interaction between the search algorithm and the machine learning model. In this work, we empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Surprisingly, we find an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLMs need the same set of nodes for effective generalisation. With these insights, we derive a data-selection distribution for learning LLM-based heuristics. On three classical planning domains, maze navigation, Sokoban and sliding tile puzzles, our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x. The codebase is at https://github.com/devaansh100/a_star.
Paper Structure (51 sections, 7 equations, 5 figures, 11 tables, 2 algorithms)

This paper contains 51 sections, 7 equations, 5 figures, 11 tables, 2 algorithms.

Figures (5)

  • Figure 1: Validation MAE of models trained on the Initial, Middle, End, and All splits, and their corresponding exclusion sets. A lower value shows better generalisation.
  • Figure 2: Puzzle representation and legend of a training puzzle from Sokoban.
  • Figure 3: Puzzle representation and legend of a training puzzle from the maze dataset.
  • Figure 4: Puzzle representation and legend of a training puzzle from the stp dataset.
  • Figure 5: Prompt used while training the language model. {curly braces} denote a placeholder.