PLLM: Pseudo-Labeling Large Language Models for CAD Program Synthesis
Yuanbo Li, Dule Shu, Yanying Chen, Matt Klenk, Daniel Ritchie
TL;DR
The paper tackles the lack of paired CAD-program data by proposing PLLM, a self-training framework that leverages unlabeled 3D shapes to synthesize supervision for CAD program synthesis. It uses a pre-trained CAD-capable LLM to generate candidate programs, executes them, and selectively retains high-fidelity program–shape pairs, then expands and diversifies programs via program-level edits before fine-tuning. Applied to adapt CAD-Recode from DeepCAD to the ABC dataset, PLLM achieves consistent improvements in geometric fidelity and program diversity across iterations. This data-centric approach reduces reliance on manual annotations and enables scalable adaptation to new CAD languages and domains.
Abstract
Recovering Computer-Aided Design (CAD) programs from 3D geometries is a widely studied problem. Recent advances in large language models (LLMs) have enabled progress in CAD program synthesis, but existing methods rely on supervised training with paired shape-program data, which is often unavailable. We introduce PLLM, a self-training framework for CAD program synthesis from unlabeled 3D shapes. Given a pre-trained CAD-capable LLM and a shape dataset, PLLM iteratively samples candidate programs, selects high-fidelity executions, and augments programs to construct synthetic program-shape pairs for fine-tuning. We experiment on adapting CAD-Recode from DeepCAD to the unlabeled ABC dataset show consistent improvements in geometric fidelity and program diversity.
