Table of Contents
Fetching ...

MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling

Damian Boborzi, Phillip Mueller, Jonas Emrich, Dominik Schmid, Sebastian Mueller, Lars Mikelsons

TL;DR

MeshFleet targets the bottleneck of domain-specific 3D generative modeling by curating a high-quality, filtered set of vehicle CAD models from Objaverse-XL. It introduces a quality classifier trained on DINOv2 and SigLIP embeddings, refined with captions and uncertainty-driven active learning, to automate filtering and annotation. Fine-tuning SV3D on MeshFleet yields superior domain-specific generation quality compared with caption- or aesthetic-based filtering, demonstrating the value of data quality over quantity. The authors release MeshFleet and related resources to advance research in controlled, high-fidelity 3D generation for engineering applications.

Abstract

Generative models have recently made remarkable progress in the field of 3D objects. However, their practical application in fields like engineering remains limited since they fail to deliver the accuracy, quality, and controllability needed for domain-specific tasks. Fine-tuning large generative models is a promising perspective for making these models available in these fields. Creating high-quality, domain-specific 3D datasets is crucial for fine-tuning large generative models, yet the data filtering and annotation process remains a significant bottleneck. We present MeshFleet, a filtered and annotated 3D vehicle dataset extracted from Objaverse-XL, the most extensive publicly available collection of 3D objects. Our approach proposes a pipeline for automated data filtering based on a quality classifier. This classifier is trained on a manually labeled subset of Objaverse, incorporating DINOv2 and SigLIP embeddings, refined through caption-based analysis and uncertainty estimation. We demonstrate the efficacy of our filtering method through a comparative analysis against caption and image aesthetic score-based techniques and fine-tuning experiments with SV3D, highlighting the importance of targeted data selection for domain-specific 3D generative modeling.

MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling

TL;DR

MeshFleet targets the bottleneck of domain-specific 3D generative modeling by curating a high-quality, filtered set of vehicle CAD models from Objaverse-XL. It introduces a quality classifier trained on DINOv2 and SigLIP embeddings, refined with captions and uncertainty-driven active learning, to automate filtering and annotation. Fine-tuning SV3D on MeshFleet yields superior domain-specific generation quality compared with caption- or aesthetic-based filtering, demonstrating the value of data quality over quantity. The authors release MeshFleet and related resources to advance research in controlled, high-fidelity 3D generation for engineering applications.

Abstract

Generative models have recently made remarkable progress in the field of 3D objects. However, their practical application in fields like engineering remains limited since they fail to deliver the accuracy, quality, and controllability needed for domain-specific tasks. Fine-tuning large generative models is a promising perspective for making these models available in these fields. Creating high-quality, domain-specific 3D datasets is crucial for fine-tuning large generative models, yet the data filtering and annotation process remains a significant bottleneck. We present MeshFleet, a filtered and annotated 3D vehicle dataset extracted from Objaverse-XL, the most extensive publicly available collection of 3D objects. Our approach proposes a pipeline for automated data filtering based on a quality classifier. This classifier is trained on a manually labeled subset of Objaverse, incorporating DINOv2 and SigLIP embeddings, refined through caption-based analysis and uncertainty estimation. We demonstrate the efficacy of our filtering method through a comparative analysis against caption and image aesthetic score-based techniques and fine-tuning experiments with SV3D, highlighting the importance of targeted data selection for domain-specific 3D generative modeling.

Paper Structure

This paper contains 21 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Simplyfied overview of the quality assessment process to generate the MeshFleet Dataset. We render 4 views of each object from high-quality objaverse-XL subsets. We use object detection, clustering and text based filtering to generate a subset of vehicle candidate objects which are subsequently labeled. We then train the High-Quality Car Classifier using the labeled 3D-Car-Quality Dataset. After training we used the trained classifiert to automatically generate the High-Quality Car Dataset which is finally manually reviwed and annotated.
  • Figure 2: Example views of two vehicles from the Validation set. With the original render (top), SV3D without fine-tuning generated (2nd row), SV3D with Label 4 fine-tuning (3rd row), Label 3 fine-tuning (4th row), Label 2 fine-tuning (5th row), and MeshFleet fine-tuning (6th row).
  • Figure 3: Comparison of the labels from the manual quality labeling (from label 1 to label 5) to the Aesthetic Scores from TRELLIS500K xiang2024structured. The plot shows the frequency of aesthetic scores at the different quality labels. We only include data which are described as a car based on the caption from TRELLIS500K.
  • Figure 4: Relative amount of objects in each label categorie for the final dataset we used for training and testing the vehicle classification model. The total amount of objects in the dataset is $6200$. Example objects for each label are shown inside each corresponding section.
  • Figure 5: Distribution of vehicle categories within the MeshFleet dataset. The bar chart displays the frequency of each vehicle category (e.g., Sports Car, Coupe, SUV, Sedan) present in the dataset. The x-axis labels indicate the category, and the y-axis represents the number of 3D models belonging to that category.
  • ...and 11 more figures