Table of Contents
Fetching ...

MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds

William Hill, Ireton Liu, Anita De Mello Koch, Damion Harvey, Nishanth Kumar, George Konidaris, Steven James

TL;DR

MinePlanner presents a scalable framework and a 45-task benchmark for long-horizon planning in large Minecraft worlds, emphasizing open-world, object-dense environments. It supports both propositional and numeric PDDL representations and includes automatic task generation, plan verification, and visualization. Experimental results show that state-of-the-art domain-independent planners struggle with translation/grounding and scaling to thousands of objects, indicating substantial gaps in current planning approaches. The work aims to spur development of new planning techniques capable of handling complex, real-world-like domains and to bridge learning and planning through a challenging, parameterizable testbed.

Abstract

We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing with many of the challenges advanced by our new benchmark, such as scaling to instances with thousands of objects. Based on these results, we identify areas of improvement for future planners. Our framework is made available at https://github.com/IretonLiu/mine-pddl/.

MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds

TL;DR

MinePlanner presents a scalable framework and a 45-task benchmark for long-horizon planning in large Minecraft worlds, emphasizing open-world, object-dense environments. It supports both propositional and numeric PDDL representations and includes automatic task generation, plan verification, and visualization. Experimental results show that state-of-the-art domain-independent planners struggle with translation/grounding and scaling to thousands of objects, indicating substantial gaps in current planning approaches. The work aims to spur development of new planning techniques capable of handling complex, real-world-like domains and to bridge learning and planning through a challenging, parameterizable testbed.

Abstract

We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing with many of the challenges advanced by our new benchmark, such as scaling to instances with thousands of objects. Based on these results, we identify areas of improvement for future planners. Our framework is made available at https://github.com/IretonLiu/mine-pddl/.
Paper Structure (14 sections, 3 figures, 4 tables)

This paper contains 14 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A classical planning problem in our benchmark, requiring the agent to collect the necessary blocks and build a log cabin (outlined in green). This task contains over 5000 objects that the agent must reason about, including many that make up the surrounding blocks and ground that are irrelevant to the goal.
  • Figure 2: Three variants for the task of navigating to a particular location. (a) The easy task contains no irrelevant blocks, and so the world is empty. (b) The medium contains a few additional blocks which serve as obstacles and make navigation more challenging. (c) The hard task requires navigating within a small village consisting of hundreds of objects that are irrelevant for this particular task.
  • Figure 3: The translation and planner search time is shown for the move task, starting from the size of the easy variant, and increasing until the FastDownward Planner can no longer translate the problem. The solid line and shaded areas represent the mean and standard deviation over five runs.