Table of Contents
Fetching ...

Natural Selection via Foundation Models for Soft Robot Evolution

Changhe Chen, Xiaohao Xu, Xiangdong Wang, Xiaonan Huang

TL;DR

This work targets the challenge of designing soft robots by introducing RoboCrafter-QA, a multimodal benchmark built on EvoGym to test embodied design reasoning in LLMs. Initial evaluations reveal that state-of-the-art models struggle with fine-grained design distinctions, motivating a finetuning pipeline using LoRA on an efficient open-source LLM that achieves state-of-the-art performance for both design selection and direct morphology generation. The authors validate the approach with a physical modular soft robot, showing strong sim-to-real correlation and practical design transfer from simulation to hardware. Overall, the paper demonstrates that specialized, data-driven instruction tuning can unlock LLMs as effective co-designers for real-world soft-robot morphologies, and releases a complete framework for future embodied design research.

Abstract

Designing soft robots is a complex and iterative process that demands cross-disciplinary expertise in materials science, mechanics, and control, often relying on intuition and extensive experimentation. While foundation models, especially Large Language Models (LLMs), have demonstrated impressive reasoning abilities, their capacity to conduct embodied design remains largely unexplored. This paper introduces RoboCrafter-QA, a novel benchmark to evaluate whether LLMs can learn representations of soft robot designs that effectively bridge the gap between high-level task descriptions and low-level morphological and material choices. RoboCrafter-QA leverages the EvoGym simulator to generate a diverse set of soft robot design challenges, spanning robotic locomotion, manipulation, and balancing tasks. Our experiments with SOTA multi-modal LLMs reveal that while these models exhibit promising capabilities in learning design representations, they struggle with fine-grained distinctions between designs with subtle performance differences. To overcome these limitations, we finetune an efficient, open-source LLM that achieves SOTA performance on our benchmark, demonstrating superior capabilities in both design selection and direct generation of high-performing robot morphologies. Furthermore, we construct a physical replica of the modular soft robot and demonstrate a strong sim-to-real correlation, validating that superior benchmark performance has the potential to translate to effective real-world design selection. Our full system will be open-sourced to foster this exciting direction.

Natural Selection via Foundation Models for Soft Robot Evolution

TL;DR

This work targets the challenge of designing soft robots by introducing RoboCrafter-QA, a multimodal benchmark built on EvoGym to test embodied design reasoning in LLMs. Initial evaluations reveal that state-of-the-art models struggle with fine-grained design distinctions, motivating a finetuning pipeline using LoRA on an efficient open-source LLM that achieves state-of-the-art performance for both design selection and direct morphology generation. The authors validate the approach with a physical modular soft robot, showing strong sim-to-real correlation and practical design transfer from simulation to hardware. Overall, the paper demonstrates that specialized, data-driven instruction tuning can unlock LLMs as effective co-designers for real-world soft-robot morphologies, and releases a complete framework for future embodied design research.

Abstract

Designing soft robots is a complex and iterative process that demands cross-disciplinary expertise in materials science, mechanics, and control, often relying on intuition and extensive experimentation. While foundation models, especially Large Language Models (LLMs), have demonstrated impressive reasoning abilities, their capacity to conduct embodied design remains largely unexplored. This paper introduces RoboCrafter-QA, a novel benchmark to evaluate whether LLMs can learn representations of soft robot designs that effectively bridge the gap between high-level task descriptions and low-level morphological and material choices. RoboCrafter-QA leverages the EvoGym simulator to generate a diverse set of soft robot design challenges, spanning robotic locomotion, manipulation, and balancing tasks. Our experiments with SOTA multi-modal LLMs reveal that while these models exhibit promising capabilities in learning design representations, they struggle with fine-grained distinctions between designs with subtle performance differences. To overcome these limitations, we finetune an efficient, open-source LLM that achieves SOTA performance on our benchmark, demonstrating superior capabilities in both design selection and direct generation of high-performing robot morphologies. Furthermore, we construct a physical replica of the modular soft robot and demonstrate a strong sim-to-real correlation, validating that superior benchmark performance has the potential to translate to effective real-world design selection. Our full system will be open-sourced to foster this exciting direction.

Paper Structure

This paper contains 17 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Conceptual comparison of natural creature and artificial robot evolution paradigms. (a) Evolution-driven emergence of creatures in nature through selective pressures; (b) Traditional human-engineered robot design guided by intuition and expertise; (c) AI-driven robot design selection, where foundation models (e.g., Large Language Models) act as a 'natural selector' for robot design evolution (e.g., modular robots shown in the figure). This shift highlights the transition from biological evolution to human-driven engineering and finally to AI-empowered selection.
  • Figure 2: RobotCrafter-QA robot design data generation and LLM evaluation pipeline.
  • Figure 3: Overview of robot design tasks curated from EvoGym bhatia2021evolution for the RoboCrafter-QA benchmark.
  • Figure 4: Example RoboCrafter-QA question and LLM responses. This figure illustrates a sample question from the benchmark, showing the two robot designs (color-coded from input matrices), the question prompt, and the LLM responses with the ground truth. The simulated locomotion of the robot designs are illustrated at the bottom.
  • Figure 5: Example of robot designs with varied rewards.
  • ...and 2 more figures