EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model
Zefu Lin, Rongxu Cui, Chen Hanning, Xiangyu Wang, Junjia Xu, Xiaojuan Jin, Chen Wenbo, Hui Zhou, Lue Fan, Wenling Li, Zhaoxiang Zhang
TL;DR
EmbodiedCoder tackles open-world mobile manipulation without extensive data collection by converting natural language instructions into executable code that parameterizes object geometry and synthesizes trajectories. The approach combines scene understanding with semantic grounding, code-driven geometric parameterization, and constraint-aware trajectory generation to bridge perception and low-level control in a transparent, zero-shot manner. Real-robot experiments show strong generalization to novel objects and environments and competitive performance against data-driven policies and other code-generation methods, particularly in long-horizon tasks like door opening. This work advances interpretable, training-free integration of high-level reasoning with robotic manipulation, enabling more flexible and scalable robot intelligence.
Abstract
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://embodiedcoder.github.io/EmbodiedCoder/
