Table of Contents
Fetching ...

BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement

Yuhao Du, Shunian Chen, Wenbo Zan, Peizhao Li, Mingxuan Wang, Dingjie Song, Bo Li, Yan Hu, Benyou Wang

TL;DR

BlenderLLM tackles the CAD automation problem by training LLMs to produce executable Blender bpy scripts from natural language using a self-improvement loop. The method builds BlendNet (~8,000 training samples) and CADBench, and refines the model via supervised fine-tuning followed by iterative self-improvement aided by a cascade filter and MLLM-as-judge. CADBench computes a final score as Score = (1/|C|) sum_{c_i in C} E(l, I, s, c_i), enabling open-ended evaluation across image and script criteria, and BlenderLLM achieves state-of-the-art results across CADBench-Sim and CADBench-Wild, outpacing baselines. The work contributes data, models, and benchmarks to advance CAD automation with self-improving LLMs for open-ended design tasks and on-premises deployment.

Abstract

The application of Large Language Models (LLMs) in Computer-Aided Design (CAD) remains an underexplored area, despite their remarkable advancements in other domains. In this paper, we present BlenderLLM, a novel framework for training LLMs specifically for CAD tasks leveraging a self-improvement methodology. To support this, we developed a bespoke training dataset, BlendNet, and introduced a comprehensive evaluation suite, CADBench. Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts. However, through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation. This research establishes a strong foundation for the application of LLMs in CAD while demonstrating the transformative potential of self-improving models in advancing CAD automation. We encourage further exploration and adoption of these methodologies to drive innovation in the field. The dataset, model, benchmark, and source code are publicly available at https://github.com/FreedomIntelligence/BlenderLLM

BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement

TL;DR

BlenderLLM tackles the CAD automation problem by training LLMs to produce executable Blender bpy scripts from natural language using a self-improvement loop. The method builds BlendNet (~8,000 training samples) and CADBench, and refines the model via supervised fine-tuning followed by iterative self-improvement aided by a cascade filter and MLLM-as-judge. CADBench computes a final score as Score = (1/|C|) sum_{c_i in C} E(l, I, s, c_i), enabling open-ended evaluation across image and script criteria, and BlenderLLM achieves state-of-the-art results across CADBench-Sim and CADBench-Wild, outpacing baselines. The work contributes data, models, and benchmarks to advance CAD automation with self-improving LLMs for open-ended design tasks and on-premises deployment.

Abstract

The application of Large Language Models (LLMs) in Computer-Aided Design (CAD) remains an underexplored area, despite their remarkable advancements in other domains. In this paper, we present BlenderLLM, a novel framework for training LLMs specifically for CAD tasks leveraging a self-improvement methodology. To support this, we developed a bespoke training dataset, BlendNet, and introduced a comprehensive evaluation suite, CADBench. Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts. However, through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation. This research establishes a strong foundation for the application of LLMs in CAD while demonstrating the transformative potential of self-improving models in advancing CAD automation. We encourage further exploration and adoption of these methodologies to drive innovation in the field. The dataset, model, benchmark, and source code are publicly available at https://github.com/FreedomIntelligence/BlenderLLM

Paper Structure

This paper contains 106 sections, 6 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustrative Instances
  • Figure 2: The Pipeline of the Methodology. In Step I, we utilize a multi-module pipeline to construct a high-quality training dataset and fine-tune the Base Model and Base Filter on it, establishing a foundation for the next phase. In Step II, the model is fine-tuned by Self-improvement until achieving the optimal model.
  • Figure 3: Diversity in Training and Evaluation Datasets. Each dataset is designed to ensure a uniform distribution across Category and Instruction Type, while maintaining a broad-ranging density in Instruction Length.
  • Figure 4: Dimensions of Criteria. Numbers represent the average count of criteria in that dimension.
  • Figure 5: Process for Script Generation. We carefully designed the prompt to maximize the responsiveness and effectiveness of GPT-4o, ensuring that it generates high-quality and contextually accurate CAD scripts.
  • ...and 5 more figures