Table of Contents
Fetching ...

CoinRobot: Generalized End-to-end Robotic Learning for Physical Intelligence

Yu Zhao, Huxian Liu, Xiang Chen, Jiankai Sun, Jiahuan Yan, Luhui Hu

TL;DR

CoinRobot addresses the challenge of generalizing end-to-end robotic learning across heterogeneous hardware and tasks. It combines a diffusion-based action policy with a modular perception and control stack, enabling cross-platform deployment and minimal task-specific customization. The work demonstrates seven real-world tasks with diffusion policies outperforming a LeRobot baseline and shows multi-task and cross-view generalization capabilities. This framework and its open-source datasets and models aim to democratize robust embodied intelligence across diverse robotic systems.

Abstract

Physical intelligence holds immense promise for advancing embodied intelligence, enabling robots to acquire complex behaviors from demonstrations. However, achieving generalization and transfer across diverse robotic platforms and environments requires careful design of model architectures, training strategies, and data diversity. Meanwhile existing systems often struggle with scalability, adaptability to heterogeneous hardware, and objective evaluation in real-world settings. We present a generalized end-to-end robotic learning framework designed to bridge this gap. Our framework introduces a unified architecture that supports cross-platform adaptability, enabling seamless deployment across industrial-grade robots, collaborative arms, and novel embodiments without task-specific modifications. By integrating multi-task learning with streamlined network designs, it achieves more robust performance than conventional approaches, while maintaining compatibility with varying sensor configurations and action spaces. We validate our framework through extensive experiments on seven manipulation tasks. Notably, Diffusion-based models trained in our framework demonstrated superior performance and generalizability compared to the LeRobot framework, achieving performance improvements across diverse robotic platforms and environmental conditions.

CoinRobot: Generalized End-to-end Robotic Learning for Physical Intelligence

TL;DR

CoinRobot addresses the challenge of generalizing end-to-end robotic learning across heterogeneous hardware and tasks. It combines a diffusion-based action policy with a modular perception and control stack, enabling cross-platform deployment and minimal task-specific customization. The work demonstrates seven real-world tasks with diffusion policies outperforming a LeRobot baseline and shows multi-task and cross-view generalization capabilities. This framework and its open-source datasets and models aim to democratize robust embodied intelligence across diverse robotic systems.

Abstract

Physical intelligence holds immense promise for advancing embodied intelligence, enabling robots to acquire complex behaviors from demonstrations. However, achieving generalization and transfer across diverse robotic platforms and environments requires careful design of model architectures, training strategies, and data diversity. Meanwhile existing systems often struggle with scalability, adaptability to heterogeneous hardware, and objective evaluation in real-world settings. We present a generalized end-to-end robotic learning framework designed to bridge this gap. Our framework introduces a unified architecture that supports cross-platform adaptability, enabling seamless deployment across industrial-grade robots, collaborative arms, and novel embodiments without task-specific modifications. By integrating multi-task learning with streamlined network designs, it achieves more robust performance than conventional approaches, while maintaining compatibility with varying sensor configurations and action spaces. We validate our framework through extensive experiments on seven manipulation tasks. Notably, Diffusion-based models trained in our framework demonstrated superior performance and generalizability compared to the LeRobot framework, achieving performance improvements across diverse robotic platforms and environmental conditions.

Paper Structure

This paper contains 17 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of the framework: A real-world robot learning setup can be constructed using a variety of robots, data collection method and adaptable to multiple model structure
  • Figure 2: End-to-End Framework: The pipeline illustrates the end-to-end process for a generalized robotic learning implementation, from hardware setup and task design to data collection, modeling and training, evaluation , and model deployment. This framework is designed to be structurally simple and economically feasible for deployment.
  • Figure 3: Real-world task design: Each task features distinct visual state changes. The left image of each subfigure shows the initial state of the environment; the right image shows the goal state. See Section \ref{['sec:Task Design']} for a detailed task description.
  • Figure 4: Comparison of training requirement and performance between individual task models and multitask model subsequently fine-tuned for only 50 epochs on a pretrained individual task model.
  • Figure : (a) 2nd camera: front view (40 demos)
  • ...and 3 more figures