Table of Contents
Fetching ...

DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces

Mohammad Sadil Khan, Muhammad Usama, Rolandos Alexandros Potamias, Didier Stricker, Muhammad Zeshan Afzal, Jiankang Deng, Ismail Elezi

TL;DR

DreamCAD is proposed, a multi-modal generative framework that directly produces editable BReps from point-level supervision, without CAD-specific annotations, and achieves state-of-the-art performance on ABC and Objaverse benchmarks across text, image, and point modalities, improving geometric fidelity and surpassing 75% user preference.

Abstract

Computer-Aided Design (CAD) relies on structured and editable geometric representations, yet existing generative methods are constrained by small annotated datasets with explicit design histories or boundary representation (BRep) labels. Meanwhile, millions of unannotated 3D meshes remain untapped, limiting progress in scalable CAD generation. To address this, we propose DreamCAD, a multi-modal generative framework that directly produces editable BReps from point-level supervision, without CAD-specific annotations. DreamCAD represents each BRep as a set of parametric patches (e.g., Bézier surfaces) and uses a differentiable tessellation method to generate meshes. This enables large-scale training on 3D datasets while reconstructing connected and editable surfaces. Furthermore, we introduce CADCap-1M, the largest CAD captioning dataset to date, with 1M+ descriptions generated using GPT-5 for advancing text-to-CAD research. DreamCAD achieves state-of-the-art performance on ABC and Objaverse benchmarks across text, image, and point modalities, improving geometric fidelity and surpassing 75% user preference. Code and dataset will be publicly available.

DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces

TL;DR

DreamCAD is proposed, a multi-modal generative framework that directly produces editable BReps from point-level supervision, without CAD-specific annotations, and achieves state-of-the-art performance on ABC and Objaverse benchmarks across text, image, and point modalities, improving geometric fidelity and surpassing 75% user preference.

Abstract

Computer-Aided Design (CAD) relies on structured and editable geometric representations, yet existing generative methods are constrained by small annotated datasets with explicit design histories or boundary representation (BRep) labels. Meanwhile, millions of unannotated 3D meshes remain untapped, limiting progress in scalable CAD generation. To address this, we propose DreamCAD, a multi-modal generative framework that directly produces editable BReps from point-level supervision, without CAD-specific annotations. DreamCAD represents each BRep as a set of parametric patches (e.g., Bézier surfaces) and uses a differentiable tessellation method to generate meshes. This enables large-scale training on 3D datasets while reconstructing connected and editable surfaces. Furthermore, we introduce CADCap-1M, the largest CAD captioning dataset to date, with 1M+ descriptions generated using GPT-5 for advancing text-to-CAD research. DreamCAD achieves state-of-the-art performance on ABC and Objaverse benchmarks across text, image, and point modalities, improving geometric fidelity and surpassing 75% user preference. Code and dataset will be publicly available.
Paper Structure (18 sections, 5 equations, 18 figures, 5 tables)

This paper contains 18 sections, 5 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Our proposed DreamCAD (left) is a multimodal generative framework that can reconstruct editable CAD models from text, images, and point clouds using parametric patches. CADCap-1M (right) provides 1M+ GPT-5–generated captions.
  • Figure 2: Bézier surface representation and differentiable tessellation.
  • Figure 3: DreamCAD Overview:(A). Sparse Transformer VAE takes as input mesh, generates active voxels $v_i$ with local features $f_i$, from DINOv2 dinov2 embeddings, normal images, and SDF values and encodes it to generate structured latents $z_i$. These are then decoded into parametric (rational bézier) surfaces and optimized using Chamfer loss. (B). Initial $C^0$-continuous Parametric Surface generation from sparse voxels via flood-fill and quad conversion using grid control points and unit weights. (C). Multi-modal CAD generation from images, or points using a coarse-to-fine flow-matching framework from coarse voxel grid to parametric surface refinement.
  • Figure 4: Examples of metadata-augmented captions from CADCap-1M showing object type, part names, and hole counts.
  • Figure 5: Qualitative comparison on Point2CAD (Top-Right), Image2CAD (Bottom-Left) and Text2CAD (Right) tasks. For each task, the first four examples are from the ABC dataset, while the last two from Objaverse dataset. ✗$\,$ indicates invalid models.
  • ...and 13 more figures