Table of Contents
Fetching ...

Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought

Huaxiaoyue Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury

TL;DR

Demo2Code addresses the challenge of generating robot task code from a combination of language instructions and demonstrations by introducing an extended chain-of-thought pipeline that first condenses demonstrations into a latent task specification and then expands that specification into executable code. The approach uses two stages: Stage 1 recursively summarizes demonstrations to a compact specification, and Stage 2 recursively expands that specification into multi-layer code that ultimately relies only on provided APIs. Evaluation across three benchmarks Robotouille, EPIC-Kitchens, and tabletop manipulation shows Demo2Code achieving unit-test and execution success close to the oracle Spec2Code and outperforming baselines in grounding user preferences and generalizing to unseen objects and longer horizon tasks. The work demonstrates that latent task specifications plus iterative reasoning enable robust long-horizon program synthesis for robotics, with potential to improve accessibility, customization, and scalability of home and service robots. Practical implications include integrating demonstration grounded prompts with LLMs to automate generative coding for complex robotic tasks, while future work must address context length, dynamic libraries, and verification feedback loops.

Abstract

Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment. The project's website is available at https://portal-cornell.github.io/demo2code/

Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought

TL;DR

Demo2Code addresses the challenge of generating robot task code from a combination of language instructions and demonstrations by introducing an extended chain-of-thought pipeline that first condenses demonstrations into a latent task specification and then expands that specification into executable code. The approach uses two stages: Stage 1 recursively summarizes demonstrations to a compact specification, and Stage 2 recursively expands that specification into multi-layer code that ultimately relies only on provided APIs. Evaluation across three benchmarks Robotouille, EPIC-Kitchens, and tabletop manipulation shows Demo2Code achieving unit-test and execution success close to the oracle Spec2Code and outperforming baselines in grounding user preferences and generalizing to unseen objects and longer horizon tasks. The work demonstrates that latent task specifications plus iterative reasoning enable robust long-horizon program synthesis for robotics, with potential to improve accessibility, customization, and scalability of home and service robots. Practical implications include integrating demonstration grounded prompts with LLMs to automate generative coding for complex robotic tasks, while future work must address context length, dynamic libraries, and verification feedback loops.

Abstract

Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment. The project's website is available at https://portal-cornell.github.io/demo2code/
Paper Structure (132 sections, 15 figures, 7 tables, 1 algorithm)

This paper contains 132 sections, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: Overview of Demo2Code that converts language instruction and demonstrations to task code that the robot can execute. The framework recursively summarizes both down to a specification, then recursively expands the specification to an executable task code with all the helper functions defined.
  • Figure 2: Recursive summarization of input demonstrations to a compact specification. (Stage 1)
  • Figure 3: Recursive expansion of the high-level code generated from the specification, where new functions are defined by the LLM along the way. (Stage 2)
  • Figure 4: Demo2Code successfully extracts specificity in tabletop tasks. Lang2Code lacks demonstrations and randomly chooses a spatial location while DemoNoLang2Code lacks context in what the demonstrations are for.
  • Figure 5: Demo2Code summarizes demonstrations and identify different users' preferences on how to make a burger (e.g. whether to include lettuce or cheese) in Robotouille simulator. Then, it generates personalized burger cooking code to use the user's preferred ingredients.
  • ...and 10 more figures