BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

Mingi Kim; Yongjun Kim; Jungwoo Kang; Hyungki Kim

BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

Mingi Kim, Yongjun Kim, Jungwoo Kang, Hyungki Kim

TL;DR

BrepCoder is a Multimodal Large Language Model (MLLM) that performs diverse CAD tasks from B-rep inputs that achieves superior generalization across diverse tasks, demonstrating its potential as a general-purpose CAD agent.

Abstract

Recent advancements in deep learning have actively addressed complex challenges within the Computer-Aided Design (CAD) domain.However, most existing approaches rely on task-specifi c models requiring structural modifi cations for new tasks, and they predominantly focus on point clouds or images rather than the industry-standard Boundary Representation (B-rep) format. To address these limitations, we propose BrepCoder, a unifi ed Multimodal Large Language Model (MLLM) that performs diverse CAD tasks from B-rep inputs. By leveraging the code generation capabilities of Large Language Models (LLMs), we convert CAD modeling sequences into Python-like code and align them with B-rep. We then adopt a two-stage training strategy: First, pre-training on reverse engineering to learn geometric features and design logic. Second, eff ectively extending the model to various downstream tasks such as completion, error correction, and CAD-QA. Consequently, by interpreting B-rep as structural code, BrepCoder achieves superior generalization across diverse tasks, demonstrating its potential as a general-purpose CAD agent.

BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

TL;DR

Abstract

Paper Structure (11 sections, 9 equations, 4 figures, 3 tables)

This paper contains 11 sections, 9 equations, 4 figures, 3 tables.

Introduction
Related Work
Method
Multimodal Alignment
Multimodal Integration & Training
Experiments
Experimental Setups
Reverse Engineering
Downstream Tasks
Ablation Studies
Conclusion

Figures (4)

Figure 1: Comparison of CAD Learning Frameworks. (a) Task-Specific Architecture: Models that adopt different architectures depending on the task. (b) BrepCoder: A unified MLLM framework that employs a two-stage training strategy.
Figure 2: The overall framework of BrepCoder. In Phase 1, B-rep representations are aligned with CAD Code. In Phase 2, the frozen B-rep encoder is integrated with the LLM via a projector. Employing a Two-Stage Training strategy, Stage 1 captures the correspondence between geometric features and code through a reverse engineering task, while Stage 2 fine-tunes the model for diverse downstream tasks.
Figure 3: Comparison of CAD Representations. (a) DeepCAD representation composed of integer tokens. (b) Python-like CAD code representation that explicitly describes design logic.
Figure 4: Qualitative results of BrepCoder across various CAD domain tasks: (a) Reverse engineering, (b) Completion, (c) Error correction, and (d) CAD-QA.

BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

TL;DR

Abstract

BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)