Table of Contents
Fetching ...

RoboDesign1M: A Large-scale Dataset for Robot Design Understanding

Tri Le, Toan Nguyen, Quang Tran, Quang Nguyen, Baoru Huang, Hoan Nguyen, Minh Nhat Vu, Tung D. Ta, Anh Nguyen

TL;DR

RoboDesign1M tackles the lack of large-scale, domain-specific robot-design data by introducing a multimodal dataset of over 1M samples collected from scientific literature through a semi-automated pipeline, augmented with a visual instruction-following portion generated via LLMs. The dataset enables training and evaluating foundation-model-empowered approaches on robot-design tasks such as VQA, cross-modal design search, and text-to-design image generation, demonstrating improved generalization and realism when finetuned on RoboDesign1M. Key contributions include the data collection and filtering pipeline, extensive statistics illustrating diversity and reliability, and comprehensive cross-dataset experiments showing RoboDesign1M’s utility beyond its own domain. The work significantly advances AI-driven robot-design automation by providing a large, reliable, multi-modal resource and establishing benchmarks for future research, with public release planned for community use.

Abstract

Robot design is a complex and time-consuming process that requires specialized expertise. Gaining a deeper understanding of robot design data can enable various applications, including automated design generation, retrieving example designs from text, and developing AI-powered design assistants. While recent advancements in foundation models present promising approaches to addressing these challenges, progress in this field is hindered by the lack of large-scale design datasets. In this paper, we introduce RoboDesign1M, a large-scale dataset comprising 1 million samples. Our dataset features multimodal data collected from scientific literature, covering various robotics domains. We propose a semi-automated data collection pipeline, enabling efficient and diverse data acquisition. To assess the effectiveness of RoboDesign1M, we conduct extensive experiments across multiple tasks, including design image generation, visual question answering about designs, and design image retrieval. The results demonstrate that our dataset serves as a challenging new benchmark for design understanding tasks and has the potential to advance research in this field. RoboDesign1M will be released to support further developments in AI-driven robotic design automation.

RoboDesign1M: A Large-scale Dataset for Robot Design Understanding

TL;DR

RoboDesign1M tackles the lack of large-scale, domain-specific robot-design data by introducing a multimodal dataset of over 1M samples collected from scientific literature through a semi-automated pipeline, augmented with a visual instruction-following portion generated via LLMs. The dataset enables training and evaluating foundation-model-empowered approaches on robot-design tasks such as VQA, cross-modal design search, and text-to-design image generation, demonstrating improved generalization and realism when finetuned on RoboDesign1M. Key contributions include the data collection and filtering pipeline, extensive statistics illustrating diversity and reliability, and comprehensive cross-dataset experiments showing RoboDesign1M’s utility beyond its own domain. The work significantly advances AI-driven robot-design automation by providing a large, reliable, multi-modal resource and establishing benchmarks for future research, with public release planned for community use.

Abstract

Robot design is a complex and time-consuming process that requires specialized expertise. Gaining a deeper understanding of robot design data can enable various applications, including automated design generation, retrieving example designs from text, and developing AI-powered design assistants. While recent advancements in foundation models present promising approaches to addressing these challenges, progress in this field is hindered by the lack of large-scale design datasets. In this paper, we introduce RoboDesign1M, a large-scale dataset comprising 1 million samples. Our dataset features multimodal data collected from scientific literature, covering various robotics domains. We propose a semi-automated data collection pipeline, enabling efficient and diverse data acquisition. To assess the effectiveness of RoboDesign1M, we conduct extensive experiments across multiple tasks, including design image generation, visual question answering about designs, and design image retrieval. The results demonstrate that our dataset serves as a challenging new benchmark for design understanding tasks and has the potential to advance research in this field. RoboDesign1M will be released to support further developments in AI-driven robotic design automation.

Paper Structure

This paper contains 14 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: We introduce RoboDesign1M, a new large-scale dataset with 1M samples covering various designs from different robotic disciplines.
  • Figure 2: Dataset creation pipeline.
  • Figure 3: Dataset Statistics. We provide statistics on (a) caption length, (b) caption vocabulary, and (c) image keywords.
  • Figure 4: Visual Instruction-Following Data. Caption and Reference Text are extracted from the documents, while question-answer pairs are generated using LLaMa 3.3 70B meta2025llama33.
  • Figure 5: Qualitative VQA Results. The answers of Qwen2-VL models finetuned on three datasets: Text2CAD khan2024text2cad, Ghezelbash ghezelbash2024mechanical, and our RoboDesign1M. Correct answers are highlighted in green, and incorrect answers are highlighted in red.
  • ...and 3 more figures