Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems

Vedant Khandelwal; Amit Sheth; Forest Agostinelli

Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems

Vedant Khandelwal, Amit Sheth, Forest Agostinelli

TL;DR

A novel foundation model is introduced, leveraging deep reinforcement learning to train heuristic functions that seamlessly adapt to new domains without further fine-tuning, highlighting the potential of foundation models to establish new standards in efficiency and adaptability for AI-driven solutions in complex pathfinding problems.

Abstract

Pathfinding problems are found throughout robotics, computational science, and natural sciences. Traditional methods to solve these require training deep neural networks (DNNs) for each new problem domain, consuming substantial time and resources. This study introduces a novel foundation model, leveraging deep reinforcement learning to train heuristic functions that seamlessly adapt to new domains without further fine-tuning. Building upon DeepCubeA, we enhance the model by providing the heuristic function with the domain's state transition information, improving its adaptability. Utilizing a puzzle generator for the 15-puzzle action space variation domains, we demonstrate our model's ability to generalize and solve unseen domains. We achieve a strong correlation between learned and ground truth heuristic values across various domains, as evidenced by robust R-squared and Concordance Correlation Coefficient metrics. These results underscore the potential of foundation models to establish new standards in efficiency and adaptability for AI-driven solutions in complex pathfinding problems.

Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems

TL;DR

Abstract

Paper Structure (31 sections, 5 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 5 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Background and Literature Review
Foundation Model
Pathfinding
Review of DeepCubeA
Generalization in Pathfinding Problems
Theoretical Framework
Methodology
Environment Generator
Proposed Approximate Value Iteration
Evaluation Metrics
Experimental Setup
Model Architecture
Training Variations
Test Data Generation
...and 16 more sections

Figures (6)

Figure 1: Comparison of heuristic values predicted by the proposed model (\ref{['fig:15with']}) and the model without action information (\ref{['fig:15without']}) against ground truth heuristic values for 15-puzzle. The model with action information performs significantly better.
Figure 2: Comparison of trained and ground truth (GT) heuristic values for the 15-puzzle domain. \ref{['fig:comppvsgtanddcavsgt15puzsub1']}, \ref{['fig:comppvsgtanddcavsgt15puzsub2']}, \ref{['fig:comppvsgtanddcavsgt15puzsub3']} for the proposed model (P), and \ref{['fig:comppvsgtanddcavsgt15puzsub4']}, \ref{['fig:comppvsgtanddcavsgt15puzsub5']}, \ref{['fig:comppvsgtanddcavsgt15puzsub6']} for DeepCubeA (DCA) variants, each showing heuristic values for 1500 states across canonical actions (C), diagonal actions (D), and canonical + diagonal actions (C+D).
Figure 3: Example of a scrambled state and the goal state for the 8-puzzle domain. The cost-to-go is significantly reduced when including diagonal moves: 16 steps for canonical moves only versus 2 steps for canonical and diagonal moves combined.
Figure 4: Comparison of heuristic values predicted by the proposed model (\ref{['fig:8with']}) and the model without action information (\ref{['fig:8without']}) against ground truth heuristic values for 8-puzzle. The model with action information performs significantly better.
Figure 5: Comparison of trained and ground truth (GT) heuristic values for the 8-puzzle domain. \ref{['fig:comppvsgtanddcavsgt8puzsub1']}, \ref{['fig:comppvsgtanddcavsgt8puzsub2']}, \ref{['fig:comppvsgtanddcavsgt8puzsub3']} for the proposed model (P), and \ref{['fig:comppvsgtanddcavsgt8puzsub4']}, \ref{['fig:comppvsgtanddcavsgt8puzsub5']}, \ref{['fig:comppvsgtanddcavsgt8puzsub6']} for DeepCubeA (DCA) variants, each showing heuristic values for 1500 states across canonical actions (C), diagonal actions (D), and canonical + diagonal actions (C+D).
...and 1 more figures

Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems

TL;DR

Abstract

Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (6)