Table of Contents
Fetching ...

GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design

Phillip Mueller, Sebastian Mueller, Lars Mikelsons

TL;DR

GeoBiked tackles domain-specific data scarcity in engineering design by providing a 4,355-image bicycle dataset with 12 geometric reference points and rich design/technical features. It introduces two automated labeling approaches—geometry via diffusion Hyperfeatures and diverse textual descriptions via GPT-4o—evaluated on 150 labeled samples and the full dataset, showing that multiple annotated references improve geometric-point accuracy and that image-grounded descriptions balance diversity with factuality, as quantified by diversity and accuracy metrics. The work delivers a practical, training-free labeling pipeline and insightful guidance on prompting and input configurations, enabling conditioning and evaluation of deep generative models in engineering design. Together, these contributions establish GeoBiked as a foundation for training, finetuning, and conditioning DGMs in domain-specific engineering contexts and point to open directions for metric development and dataset expansion.

Abstract

We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images. GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt. By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible. The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images. GPT-4o has sufficient capabilities to generate accurate descriptions of technical images. Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity. Using both as input balances creativity and accuracy. Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images. Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information. Applying foundation models in engineering design is largely unexplored. We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.

GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design

TL;DR

GeoBiked tackles domain-specific data scarcity in engineering design by providing a 4,355-image bicycle dataset with 12 geometric reference points and rich design/technical features. It introduces two automated labeling approaches—geometry via diffusion Hyperfeatures and diverse textual descriptions via GPT-4o—evaluated on 150 labeled samples and the full dataset, showing that multiple annotated references improve geometric-point accuracy and that image-grounded descriptions balance diversity with factuality, as quantified by diversity and accuracy metrics. The work delivers a practical, training-free labeling pipeline and insightful guidance on prompting and input configurations, enabling conditioning and evaluation of deep generative models in engineering design. Together, these contributions establish GeoBiked as a foundation for training, finetuning, and conditioning DGMs in domain-specific engineering contexts and point to open directions for metric development and dataset expansion.

Abstract

We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images. GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt. By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible. The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images. GPT-4o has sufficient capabilities to generate accurate descriptions of technical images. Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity. Using both as input balances creativity and accuracy. Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images. Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information. Applying foundation models in engineering design is largely unexplored. We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.
Paper Structure (32 sections, 3 equations, 11 figures, 6 tables)

This paper contains 32 sections, 3 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Unrealistic samples from the original BIKED dataset regenwetterBIKEDDatasetMachine2021.
  • Figure 2: Geometric layout described by the geometric reference points.
  • Figure 3: Qualitative Comparison of error patterns using one, two and three source images respectively. Images in the left columns (red marks) are source images. Images in the right column (blue marks) are target images. Most of the inaccuracies disappear when using three source images compared to a single one. Some uncertainty remains when uncommon samples are processed (see middle and last column). Red circles in the target images mark areas of inaccuracy in the point prediction. Best viewed when zoomed in.
  • Figure 4: Percentage of unique descriptions generated by GPT-4o, compared over different configurations of description characteristics. For the configuration of description characteristics, the length and character are denoted on the x-axis while the style is given for each bar of the plot. Left: Image only, Middle: Ground-Truth Label, Right: Image and Ground-Truth Label. Best viewed zoomed in.
  • Figure 5: Levenshtein distances between unique values for all three configurations of inputs.
  • ...and 6 more figures