Table of Contents
Fetching ...

BIKED++: A Multimodal Dataset of 1.4 Million Bicycle Image and Parametric CAD Designs

Lyle Regenwetter, Yazan Abu Obaideh, Amin Heyrani Nobari, Faez Ahmed

TL;DR

The paper introduces a public multimodal dataset of $1.4$ million parametric bicycle designs paired with rasterized images, generated via a BikeCAD-based rendering pipeline and CLIP embeddings. It details an end-to-end methodology—Sobol-based sampling in a $96$-dimensional parametric space, constraint filtering, BikeCAD XML conversion, rendering to SVG/PNG, and CLIP-embedding computation with a ViT—supplemented by a residual network to predict embeddings for faster use. A concrete application demonstrates cross-modal optimization by aligning parametric designs with text prompts in CLIP space, achieving high fidelity in embedding-prediction and effective text-driven design changes. The dataset, code, and models enable researchers to build cross-modal design tools and benchmarks, advancing multimodal reasoning in engineering design within the BIKED ecosystem.

Abstract

This paper introduces a public dataset of 1.4 million procedurally-generated bicycle designs represented parametrically, as JSON files, and as rasterized images. The dataset is created through the use of a rendering engine which harnesses the BikeCAD software to generate vector graphics from parametric designs. This rendering engine is discussed in the paper and also released publicly alongside the dataset. Though this dataset has numerous applications, a principal motivation is the need to train cross-modal predictive models between parametric and image-based design representations. For example, we demonstrate that a predictive model can be trained to accurately estimate Contrastive Language-Image Pretraining (CLIP) embeddings from a parametric representation directly. This allows similarity relations to be established between parametric bicycle designs and text strings or reference images. Trained predictive models are also made public. The dataset joins the BIKED dataset family which includes thousands of mixed-representation human-designed bicycle models and several datasets quantifying design performance. The code and dataset can be found at: https://github.com/Lyleregenwetter/BIKED_multimodal/tree/main

BIKED++: A Multimodal Dataset of 1.4 Million Bicycle Image and Parametric CAD Designs

TL;DR

The paper introduces a public multimodal dataset of million parametric bicycle designs paired with rasterized images, generated via a BikeCAD-based rendering pipeline and CLIP embeddings. It details an end-to-end methodology—Sobol-based sampling in a -dimensional parametric space, constraint filtering, BikeCAD XML conversion, rendering to SVG/PNG, and CLIP-embedding computation with a ViT—supplemented by a residual network to predict embeddings for faster use. A concrete application demonstrates cross-modal optimization by aligning parametric designs with text prompts in CLIP space, achieving high fidelity in embedding-prediction and effective text-driven design changes. The dataset, code, and models enable researchers to build cross-modal design tools and benchmarks, advancing multimodal reasoning in engineering design within the BIKED ecosystem.

Abstract

This paper introduces a public dataset of 1.4 million procedurally-generated bicycle designs represented parametrically, as JSON files, and as rasterized images. The dataset is created through the use of a rendering engine which harnesses the BikeCAD software to generate vector graphics from parametric designs. This rendering engine is discussed in the paper and also released publicly alongside the dataset. Though this dataset has numerous applications, a principal motivation is the need to train cross-modal predictive models between parametric and image-based design representations. For example, we demonstrate that a predictive model can be trained to accurately estimate Contrastive Language-Image Pretraining (CLIP) embeddings from a parametric representation directly. This allows similarity relations to be established between parametric bicycle designs and text strings or reference images. Trained predictive models are also made public. The dataset joins the BIKED dataset family which includes thousands of mixed-representation human-designed bicycle models and several datasets quantifying design performance. The code and dataset can be found at: https://github.com/Lyleregenwetter/BIKED_multimodal/tree/main
Paper Structure (15 sections, 2 equations, 5 figures)

This paper contains 15 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: The dataset aims to establish a critical link between parametrically-represented bicycle designs and bicycle images, enabling cross-modal comparisons between: 1) Text or images and 2) Parametric designs or their associated functional performance attributes.
  • Figure 2: Overview of the dataset generation methodology. Components included in the final dataset are highlighted in yellow. Operations and code colored in gray are included in the codebase. Components colored in blue are not included due to storage limitations.
  • Figure 3: We replace the computationally expensive rendering, rasterization, embedding calculation, and view averaging process by a single residual network model, which predicts the final CLIP embedding without directly calculating it.
  • Figure 4: Given a method to estimate CLIP embeddings for parametric designs, similarity can easily be calculated with respect to arbitrary text and image prompts, whose embeddings can also be calculated.
  • Figure 5: Original design and design optimized to look like "a yellow mountain bike."