Table of Contents
Fetching ...

EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model

Zefu Lin, Rongxu Cui, Chen Hanning, Xiangyu Wang, Junjia Xu, Xiaojuan Jin, Chen Wenbo, Hui Zhou, Lue Fan, Wenling Li, Zhaoxiang Zhang

TL;DR

EmbodiedCoder tackles open-world mobile manipulation without extensive data collection by converting natural language instructions into executable code that parameterizes object geometry and synthesizes trajectories. The approach combines scene understanding with semantic grounding, code-driven geometric parameterization, and constraint-aware trajectory generation to bridge perception and low-level control in a transparent, zero-shot manner. Real-robot experiments show strong generalization to novel objects and environments and competitive performance against data-driven policies and other code-generation methods, particularly in long-horizon tasks like door opening. This work advances interpretable, training-free integration of high-level reasoning with robotic manipulation, enabling more flexible and scalable robot intelligence.

Abstract

Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://embodiedcoder.github.io/EmbodiedCoder/

EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model

TL;DR

EmbodiedCoder tackles open-world mobile manipulation without extensive data collection by converting natural language instructions into executable code that parameterizes object geometry and synthesizes trajectories. The approach combines scene understanding with semantic grounding, code-driven geometric parameterization, and constraint-aware trajectory generation to bridge perception and low-level control in a transparent, zero-shot manner. Real-robot experiments show strong generalization to novel objects and environments and competitive performance against data-driven policies and other code-generation methods, particularly in long-horizon tasks like door opening. This work advances interpretable, training-free integration of high-level reasoning with robotic manipulation, enabling more flexible and scalable robot intelligence.

Abstract

Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://embodiedcoder.github.io/EmbodiedCoder/

Paper Structure

This paper contains 22 sections, 1 equation, 7 figures, 6 tables.

Figures (7)

  • Figure 1: EmbodiedCoder employs code generation to bridge perception and manipulation by parameterizing objects and synthesizing task-specific trajectories. The figure shows a subtask Open the door derived from a long-term instruction. Through code-driven geometric parameterization, the door is represented as a parametric model with a hinge axis, and the system generates code that synthesizes a semicircular trajectory consistent with this geometry. The robotic arm then executes the opening motion by following waypoints sampled from the generated trajectory, demonstrating how coding enables functional manipulation without additional training.
  • Figure 2: Overview of the proposed system pipeline. The system consists of three modules: (i) Scene understanding and task decomposition, which processes RGB-D images with VLM and VGGT to build semantic maps and decompose instructions into subtasks; (ii) EmbodiedCoder, which prompts an coding model to perform code-driven geometric parameterization of objects and trajectory synthesis under physical and environmental constraints; and (iii) Motion execution, which samples waypoints from the synthesized trajectory and executes the manipulation with the robot arm.
  • Figure 3: Example of door parameterization. The task-relevant content in prompt is from Task Decomposition and Object-centric Semantic Understanding.
  • Figure 4: Examples of Parameterization Result for Common Objects.
  • Figure 5: Example of apple placement with obstacle avoidance. The task-relevant content in prompt is obtained through subtask decomposition and geometric parameterization.
  • ...and 2 more figures