Smart Language Agents in Real-World Planning
Annabelle Miin, Timothy Wei
TL;DR
The paper addresses enhancing constraint-based planning with large language models by focusing on sole-planning for travel. It introduces a two-step framework that first auto-generates prompts from external resources and then refines them via human-in-the-loop failure analysis, using structured reference information to improve reasoning. Empirical results show that a single human-in-the-loop iteration boosts the GPT-4o-based planner by about 139% compared with the automated prompt baseline, approaching the performance of manually crafted prompts and exceeding the original baseline in several metrics. The work demonstrates the potential of semi-automated prompt tuning to improve complex, constraint-driven planning tasks and highlights avenues for broader application, albeit with limitations around data diversity and scalability.
Abstract
Comprehensive planning agents have been a long term goal in the field of artificial intelligence. Recent innovations in Natural Language Processing have yielded success through the advent of Large Language Models (LLMs). We seek to improve the travel-planning capability of such LLMs by extending upon the work of the previous paper TravelPlanner. Our objective is to explore a new method of using LLMs to improve the travel planning experience. We focus specifically on the "sole-planning" mode of travel planning; that is, the agent is given necessary reference information, and its goal is to create a comprehensive plan from the reference information. While this does not simulate the real-world we feel that an optimization of the sole-planning capability of a travel planning agent will still be able to enhance the overall user experience. We propose a semi-automated prompt generation framework which combines the LLM-automated prompt and "human-in-the-loop" to iteratively refine the prompt to improve the LLM performance. Our result shows that LLM automated prompt has its limitations and "human-in-the-loop" greatly improves the performance by $139\%$ with one single iteration.
