Table of Contents
Fetching ...

CLAS: A Machine Learning Enhanced Framework for Exploring Large 3D Design Datasets

XiuYu Zhang, Xiaolei Ye, Jui-Che Chang, Yue Fang

TL;DR

Addressing the challenge of efficiently finding relevant 3D objects in large datasets for design ideation, the paper shows that current 3D generative models struggle to produce high-quality assets. It introduces CLAS, a four-step ML-enhanced framework (Capture, Label, Associate, Search) that converts 3D objects into text descriptions and learns cross-modal embeddings to enable text-based retrieval. A proof-of-concept on ShapeNet chairs (6,778 items) demonstrates strong retrieval performance, with MRR of 0.58, top-1 accuracy of 42.27%, and top-10 accuracy of 89.64% in a closed-set setting, and shows labeled data can also train 3D generative models. The framework offers a flexible, practical path to repurpose existing 3D datasets for ideation and evaluation across diverse object categories.

Abstract

Three-dimensional (3D) objects have wide applications. Despite the growing interest in 3D modeling in academia and industries, designing and/or creating 3D objects from scratch remains time-consuming and challenging. With the development of generative artificial intelligence (AI), designers discover a new way to create images for ideation. However, generative AIs are less useful in creating 3D objects with satisfying qualities. To allow 3D designers to access a wide range of 3D objects for creative activities based on their specific demands, we propose a machine learning (ML) enhanced framework CLAS - named after the four-step of capture, label, associate, and search - to enable fully automatic retrieval of 3D objects based on user specifications leveraging the existing datasets of 3D objects. CLAS provides an effective and efficient method for any person or organization to benefit from their existing but not utilized 3D datasets. In addition, CLAS may also be used to produce high-quality 3D object synthesis datasets for training and evaluating 3D generative models. As a proof of concept, we created and showcased a search system with a web user interface (UI) for retrieving 6,778 3D objects of chairs in the ShapeNet dataset powered by CLAS. In a close-set retrieval setting, our retrieval method achieves a mean reciprocal rank (MRR) of 0.58, top 1 accuracy of 42.27%, and top 10 accuracy of 89.64%.

CLAS: A Machine Learning Enhanced Framework for Exploring Large 3D Design Datasets

TL;DR

Addressing the challenge of efficiently finding relevant 3D objects in large datasets for design ideation, the paper shows that current 3D generative models struggle to produce high-quality assets. It introduces CLAS, a four-step ML-enhanced framework (Capture, Label, Associate, Search) that converts 3D objects into text descriptions and learns cross-modal embeddings to enable text-based retrieval. A proof-of-concept on ShapeNet chairs (6,778 items) demonstrates strong retrieval performance, with MRR of 0.58, top-1 accuracy of 42.27%, and top-10 accuracy of 89.64% in a closed-set setting, and shows labeled data can also train 3D generative models. The framework offers a flexible, practical path to repurpose existing 3D datasets for ideation and evaluation across diverse object categories.

Abstract

Three-dimensional (3D) objects have wide applications. Despite the growing interest in 3D modeling in academia and industries, designing and/or creating 3D objects from scratch remains time-consuming and challenging. With the development of generative artificial intelligence (AI), designers discover a new way to create images for ideation. However, generative AIs are less useful in creating 3D objects with satisfying qualities. To allow 3D designers to access a wide range of 3D objects for creative activities based on their specific demands, we propose a machine learning (ML) enhanced framework CLAS - named after the four-step of capture, label, associate, and search - to enable fully automatic retrieval of 3D objects based on user specifications leveraging the existing datasets of 3D objects. CLAS provides an effective and efficient method for any person or organization to benefit from their existing but not utilized 3D datasets. In addition, CLAS may also be used to produce high-quality 3D object synthesis datasets for training and evaluating 3D generative models. As a proof of concept, we created and showcased a search system with a web user interface (UI) for retrieving 6,778 3D objects of chairs in the ShapeNet dataset powered by CLAS. In a close-set retrieval setting, our retrieval method achieves a mean reciprocal rank (MRR) of 0.58, top 1 accuracy of 42.27%, and top 10 accuracy of 89.64%.

Paper Structure

This paper contains 33 sections, 2 equations, 22 figures, 1 table.

Figures (22)

  • Figure 1: CLAS, named after the main steps of capture, label, associate, and search, to enable fully automatic retrieval of 3D objects based on user specifications.
  • Figure 2: Overview of the CLAS powered application for retrieving 3D chair objects using natural languages.
  • Figure 3: Example of different prompt designs. The generated description varies according to the prompt given the same chair image.
  • Figure 4: Effectiveness of prompt Structure. The images on the left are fed to ChatGPT-4V with the prompt, and the generated text descriptions are then fed to DALLE-E to generate the right images.
  • Figure 5: Fine-tuning of the CLIP model. Batch size: 32. Epochs: 5. Weight decay: 0.01. Warm-up steps: 50. Learning rate (with cosine scheduler): 2e-5. The pairs of images and descriptions in the validation set are not seen by the model in training, which is unlikely to happen in close-set retrieval as the model is only used to retrieve items in a fixed dataset. The fine-tuning took about 30 minutes on 80% (in which 80% for training and 20% for validation) of the chairs in the ShapeNet dataset using an NVIDIA Tesla P100 GPU.
  • ...and 17 more figures