Towards a General Framework for HTN Modeling with LLMs
Israel Puerta-Merino, Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares
TL;DR
The paper addresses the gap in leveraging LLMs for hierarchical planning (HP) by extending the L2P framework to support HP modeling and evaluation. It introduces L2HP, an extensible framework with HTN data types, parsers, and a NL2HTN pipeline, enabling LLM-driven generation of domain and problem models for HP and supporting exports to PDDL, HPDL, and HDDL. Empirically, the study on PlanBench shows parsing success around 36% across AP and HP, but syntactic validity is much lower for HP (about 1%) than AP (about 20%), highlighting the unique difficulties HP poses for LLMs. Overall, the work provides a practical, reproducible platform for HP research with LLMs and outlines concrete directions, including HP-specific benchmarks, improved parsers, and more robust prompting strategies to improve model quality and usefulness.
Abstract
The use of Large Language Models (LLMs) for generating Automated Planning (AP) models has been widely explored; however, their application to Hierarchical Planning (HP) is still far from reaching the level of sophistication observed in non-hierarchical architectures. In this work, we try to address this gap. We present two main contributions. First, we propose L2HP, an extension of L2P (a library to LLM-driven PDDL models generation) that support HP model generation and follows a design philosophy of generality and extensibility. Second, we apply our framework to perform experiments where we compare the modeling capabilities of LLMs for AP and HP. On the PlanBench dataset, results show that parsing success is limited but comparable in both settings (around 36\%), while syntactic validity is substantially lower in the hierarchical case (1\% vs. 20\% of instances). These findings underscore the unique challenges HP presents for LLMs, highlighting the need for further research to improve the quality of generated HP models.
