Table of Contents
Fetching ...

Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning

Haoming Ye, Yunxiao Xiao, Cewu Lu, Panpan Cai

TL;DR

UniDomain addresses the challenge of grounding long-horizon robot task planning in real-world constraints by pretraining a unified PDDL domain from thousands of demonstrations. It blends energy-based keyframe extraction, VLM/LLM-driven domain construction with closed-loop verification, and hierarchical domain fusion to produce task-relevant meta-domains for online planning. Empirical results across four unseen task domains show substantial improvements in success and plan optimality over strong LLM-only and hybrid baselines, highlighting the value of data-driven symbolic grounding for compositional generalization. The framework promises scalable, zero-shot symbolic planning in real-world manipulation, with future work aimed at richer PDDL variants and handling perceptual uncertainty.

Abstract

Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.

Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning

TL;DR

UniDomain addresses the challenge of grounding long-horizon robot task planning in real-world constraints by pretraining a unified PDDL domain from thousands of demonstrations. It blends energy-based keyframe extraction, VLM/LLM-driven domain construction with closed-loop verification, and hierarchical domain fusion to produce task-relevant meta-domains for online planning. Empirical results across four unseen task domains show substantial improvements in success and plan optimality over strong LLM-only and hybrid baselines, highlighting the value of data-driven symbolic grounding for compositional generalization. The framework promises scalable, zero-shot symbolic planning in real-world manipulation, with future work aimed at richer PDDL variants and handling perceptual uncertainty.

Abstract

Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.

Paper Structure

This paper contains 45 sections, 7 equations, 21 figures.

Figures (21)

  • Figure 1: Visualization of our pre-trained unified domain, with 3,137 operator nodes (green) and 2,875 predicate nodes (purple).
  • Figure 2: Overview of UniDomain. See detailed descriptions in Section \ref{['sec:overview']}.
  • Figure 3: Comparison results of UniDomain and state-of-the-art methods on unseen evaluation tasks: (a) success rates $\uparrow$, success-weighted relative path lengths $\uparrow$, and optimality rates with thresholds ($K=2,1,0$) $\uparrow$; (b) Thinking time (s) $\downarrow$ of the top-performing methods; (c) number of LLM calls $\downarrow$ of the top-performing methods. Average values are shown with standard errors.
  • Figure 4: Results for ablation studies on domain generation: (a) ablation on the atomic domain learning method; (b) ablation on the domain fusion method. All values are success rates $\uparrow$ with standard errors.
  • Figure 5: Results for ablation study of the UniDomain planner. Each bar shows average task success rates $\uparrow$ with standard errors.
  • ...and 16 more figures