Table of Contents
Fetching ...

Evolution 6.0: Evolving Robotic Capabilities Through Generative Design

Muhammad Haris Khan, Artyom Myshlyaev, Artem Lykov, Miguel Altamirano Cabrera, Dzmitry Tsetserukou

TL;DR

Evolution 6.0 addresses autonomous tool design for robots operating in open-ended environments by integrating Vision-Language Models, Vision-Language Action, and Text-to-3D generative tools into two coordinating modules: Tool Generation and Action Generation. The approach demonstrates high tool-design success and solid physical/visual generalization, with weaker semantic generalization, indicating strong potential for real-world adaptability with further refinements. By enabling robots to perceive, plan, and fabricate task-specific tools on the fly, the framework advances self-sufficient, flexible robotics for challenging settings such as planetary exploration or unstructured industrial spaces.

Abstract

We propose a new concept, Evolution 6.0, which represents the evolution of robotics driven by Generative AI. When a robot lacks the necessary tools to accomplish a task requested by a human, it autonomously designs the required instruments and learns how to use them to achieve the goal. Evolution 6.0 is an autonomous robotic system powered by Vision-Language Models (VLMs), Vision-Language Action (VLA) models, and Text-to-3D generative models for tool design and task execution. The system comprises two key modules: the Tool Generation Module, which fabricates task-specific tools from visual and textual data, and the Action Generation Module, which converts natural language instructions into robotic actions. It integrates QwenVLM for environmental understanding, OpenVLA for task execution, and Llama-Mesh for 3D tool generation. Evaluation results demonstrate a 90% success rate for tool generation with a 10-second inference time, and action generation achieving 83.5% in physical and visual generalization, 70% in motion generalization, and 37% in semantic generalization. Future improvements will focus on bimanual manipulation, expanded task capabilities, and enhanced environmental interpretation to improve real-world adaptability.

Evolution 6.0: Evolving Robotic Capabilities Through Generative Design

TL;DR

Evolution 6.0 addresses autonomous tool design for robots operating in open-ended environments by integrating Vision-Language Models, Vision-Language Action, and Text-to-3D generative tools into two coordinating modules: Tool Generation and Action Generation. The approach demonstrates high tool-design success and solid physical/visual generalization, with weaker semantic generalization, indicating strong potential for real-world adaptability with further refinements. By enabling robots to perceive, plan, and fabricate task-specific tools on the fly, the framework advances self-sufficient, flexible robotics for challenging settings such as planetary exploration or unstructured industrial spaces.

Abstract

We propose a new concept, Evolution 6.0, which represents the evolution of robotics driven by Generative AI. When a robot lacks the necessary tools to accomplish a task requested by a human, it autonomously designs the required instruments and learns how to use them to achieve the goal. Evolution 6.0 is an autonomous robotic system powered by Vision-Language Models (VLMs), Vision-Language Action (VLA) models, and Text-to-3D generative models for tool design and task execution. The system comprises two key modules: the Tool Generation Module, which fabricates task-specific tools from visual and textual data, and the Action Generation Module, which converts natural language instructions into robotic actions. It integrates QwenVLM for environmental understanding, OpenVLA for task execution, and Llama-Mesh for 3D tool generation. Evaluation results demonstrate a 90% success rate for tool generation with a 10-second inference time, and action generation achieving 83.5% in physical and visual generalization, 70% in motion generalization, and 37% in semantic generalization. Future improvements will focus on bimanual manipulation, expanded task capabilities, and enhanced environmental interpretation to improve real-world adaptability.

Paper Structure

This paper contains 10 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: The Evolution 6.0 system workflow, which autonomously analyzes a scene, generates a task-specific tool, and executes actions. The process includes scene interpretation, tool generation via an autoregressive model, rendering, 3D printing, and action execution using a 7D action vector derived from task instructions via VLA.
  • Figure 2: Evolution 6.0 system architecture.
  • Figure 3: CAD model design by Generative AI.
  • Figure 4: Evaluation tasks. (a) Visual generalization: presence of several unseen distractor objects. (b) Visual generalization: unseen background. (c) Motion generalization: the plate is elevated from its original position. (d) Physical generalization: the plate has a different size than the one used in the dataset. (e) All types of generalization: the cake is initialized in unseen position and orientation, the background is changed, distractor object presents in the scene.
  • Figure 5: Phase two results: success rates across generalization categories.