Smart Language Agents in Real-World Planning

Annabelle Miin; Timothy Wei

Smart Language Agents in Real-World Planning

Annabelle Miin, Timothy Wei

TL;DR

The paper addresses enhancing constraint-based planning with large language models by focusing on sole-planning for travel. It introduces a two-step framework that first auto-generates prompts from external resources and then refines them via human-in-the-loop failure analysis, using structured reference information to improve reasoning. Empirical results show that a single human-in-the-loop iteration boosts the GPT-4o-based planner by about 139% compared with the automated prompt baseline, approaching the performance of manually crafted prompts and exceeding the original baseline in several metrics. The work demonstrates the potential of semi-automated prompt tuning to improve complex, constraint-driven planning tasks and highlights avenues for broader application, albeit with limitations around data diversity and scalability.

Abstract

Comprehensive planning agents have been a long term goal in the field of artificial intelligence. Recent innovations in Natural Language Processing have yielded success through the advent of Large Language Models (LLMs). We seek to improve the travel-planning capability of such LLMs by extending upon the work of the previous paper TravelPlanner. Our objective is to explore a new method of using LLMs to improve the travel planning experience. We focus specifically on the "sole-planning" mode of travel planning; that is, the agent is given necessary reference information, and its goal is to create a comprehensive plan from the reference information. While this does not simulate the real-world we feel that an optimization of the sole-planning capability of a travel planning agent will still be able to enhance the overall user experience. We propose a semi-automated prompt generation framework which combines the LLM-automated prompt and "human-in-the-loop" to iteratively refine the prompt to improve the LLM performance. Our result shows that LLM automated prompt has its limitations and "human-in-the-loop" greatly improves the performance by $139\%$ with one single iteration.

Smart Language Agents in Real-World Planning

TL;DR

Abstract

with one single iteration.

Paper Structure (14 sections, 1 figure, 1 table)

This paper contains 14 sections, 1 figure, 1 table.

Introduction
Related work
Methods: Framework for improving LLMs
Experiments
Set-up
Structured Reference Information
GPT-4 Turbo Baseline
Automation Module
Manual Prompt Creation
Evaluation
Data Splits
Results
Limitations
Conclusion

Figures (1)

Figure 1: Framework for using automated prompt and "human-in-the-loop" iteration to improve LLM capability to produce final plan. Here, resources at the starting point, reference to artifacts that contain the constraints that the Planner needs to conform to. We will improve the prompt via "human-in-the-loop" iteration and $R_i$ stands for the generated prompt at $i$th iteration. During each iteration, the generated prompt will be used for LLM reasoning to produce a plan. When the performance from $R_{i-1}$ and $R_i$ are very close with each other, then the iteration stops. All data in the system are colored in yellow. All activities are colored in Blue.

Smart Language Agents in Real-World Planning

TL;DR

Abstract

Smart Language Agents in Real-World Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (1)