Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

Yuxuan Guo; Shaohui Peng; Jiaming Guo; Di Huang; Xishan Zhang; Rui Zhang; Yifan Hao; Ling Li; Zikang Tian; Mingju Gao; Yutai Li; Yiming Gan; Shuai Liang; Zihao Zhang; Zidong Du; Qi Guo; Xing Hu; Yunji Chen

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

TL;DR

Luban tackles open-ended creative tasks by introducing autonomous embodied verification that grounds abstract criteria into verifiable actions. It combines a CAD-driven 3D structural speculation stage with a pragmatic in-environment verification stage to iteratively plan, build, and evaluate creations in Minecraft. Empirical results show Luban outperforms baselines on multiple quality and pragmatism metrics and achieves a 100% pragmatic verification pass rate, with additional demonstrations on a real robotic arm. This approach advances open-ended creative agents by coupling design like CAD programs with environment-based feedback, enabling self-improvement and potential real-world grounding.

Abstract

Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self-improvement in solving the task. In this work, we introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks. Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification inspired by human design practices: (1) visual verification of 3D structural speculates, which comes from agent synthesized CAD modeling programs; (2) pragmatic verification of the creation by generating and verifying environment-relevant functionality programs based on the abstract criteria. Extensive multi-dimensional human studies and Elo ratings show that the Luban completes diverse creative building tasks in our proposed benchmark and outperforms other baselines ($33\%$ to $100\%$) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world.

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

TL;DR

Abstract

) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world.

Paper Structure (26 sections, 17 figures, 5 tables)

This paper contains 26 sections, 17 figures, 5 tables.

Introduction
Related Works
Problem Definition
Method
3D Structural Speculation stage with Visual Verification
Construction stage with the Pragmatic Verification
Experiments
Exprimental Settings
Main Results
Ablation Study
Potential in Real-World Embodied Creative Tasks
Conclusion
Limitations
Broader Impacts
Computational Resources
...and 11 more sections

Figures (17)

Figure 1: (a) Agents for Well-defined long-horizontal tasks v.s. (b) Luban agent for creative tasks.
Figure 2: The diagram of Luban agent. (a) The 3D structural speculation stage uses VLM to synthesize Instructions $I$ into a CAD program representing the building 3D objects, which further includes decomposing, subcomponents generation, and assembling. The visual verification evaluates the quality of buildings through the appearance results of the CAD program construction. (b) The construction stage uses VLM to synthesize the building's 3D object program into executable construction actions to get the building in the environment. The pragmatic verification evaluates the building 3D object's pragmatism by generating environment-relevant functionality annotations and action verify programs.
Figure 3: The showcases of Luban's creation on all tasks.
Figure 4: The polar chart of multi-dimensional quality rating of creations from Luban and other method baselines. The results are grouped by tasks and averaged across all seeds and human evaluators with a 1-sigma bar
Figure 5: The polar chart of multi-dimensional quality rating of creations from Luban and ablation baselines. The results are grouped by tasks and averaged across all seeds and human evaluators with a 1-sigma bar.
...and 12 more figures

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

TL;DR

Abstract

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (17)