Table of Contents
Fetching ...

Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model

Ziyue Wang, Yayati Jadhav, Peter Pak, Amir Barati Farimani

TL;DR

Image2Gcode presents an end-to-end diffusion-based framework that directly maps 2D images to G-code trajectories for material extrusion, bypassing CAD/STL intermediates. It uses a DinoV2-conditioned DDPM on per-layer slices to generate executable toolpaths, enabling rapid prototyping and broad generalization to real-world inputs. The approach reduces workflow complexity, demonstrates diverse infill generation, and achieves modest improvements in path efficiency, with robust physical fabrication validation. Limitations include its 2D slice constraint and opportunities for extending to full 3D awareness and integrated process conditioning.

Abstract

Mechanical design and manufacturing workflows conventionally begin with conceptual design, followed by the creation of a computer-aided design (CAD) model and fabrication through material-extrusion (MEX) printing. This process requires converting CAD geometry into machine-readable G-code through slicing and path planning. While each step is well established, dependence on CAD modeling remains a major bottleneck: constructing object-specific 3D geometry is slow and poorly suited to rapid prototyping. Even minor design variations typically necessitate manual updates in CAD software, making iteration time-consuming and difficult to scale. To address this limitation, we introduce Image2Gcode, an end-to-end data-driven framework that bypasses the CAD stage and generates printer-ready G-code directly from images and part drawings. Instead of relying on an explicit 3D model, a hand-drawn or captured 2D image serves as the sole input. The framework first extracts slice-wise structural cues from the image and then employs a denoising diffusion probabilistic model (DDPM) over G-code sequences. Through iterative denoising, the model transforms Gaussian noise into executable print-move trajectories with corresponding extrusion parameters, establishing a direct mapping from visual input to native toolpaths. By producing structured G-code directly from 2D imagery, Image2Gcode eliminates the need for CAD or STL intermediates, lowering the entry barrier for additive manufacturing and accelerating the design-to-fabrication cycle. This approach supports on-demand prototyping from simple sketches or visual references and integrates with upstream 2D-to-3D reconstruction modules to enable an automated pipeline from concept to physical artifact. The result is a flexible, computationally efficient framework that advances accessibility in design iteration, repair workflows, and distributed manufacturing.

Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model

TL;DR

Image2Gcode presents an end-to-end diffusion-based framework that directly maps 2D images to G-code trajectories for material extrusion, bypassing CAD/STL intermediates. It uses a DinoV2-conditioned DDPM on per-layer slices to generate executable toolpaths, enabling rapid prototyping and broad generalization to real-world inputs. The approach reduces workflow complexity, demonstrates diverse infill generation, and achieves modest improvements in path efficiency, with robust physical fabrication validation. Limitations include its 2D slice constraint and opportunities for extending to full 3D awareness and integrated process conditioning.

Abstract

Mechanical design and manufacturing workflows conventionally begin with conceptual design, followed by the creation of a computer-aided design (CAD) model and fabrication through material-extrusion (MEX) printing. This process requires converting CAD geometry into machine-readable G-code through slicing and path planning. While each step is well established, dependence on CAD modeling remains a major bottleneck: constructing object-specific 3D geometry is slow and poorly suited to rapid prototyping. Even minor design variations typically necessitate manual updates in CAD software, making iteration time-consuming and difficult to scale. To address this limitation, we introduce Image2Gcode, an end-to-end data-driven framework that bypasses the CAD stage and generates printer-ready G-code directly from images and part drawings. Instead of relying on an explicit 3D model, a hand-drawn or captured 2D image serves as the sole input. The framework first extracts slice-wise structural cues from the image and then employs a denoising diffusion probabilistic model (DDPM) over G-code sequences. Through iterative denoising, the model transforms Gaussian noise into executable print-move trajectories with corresponding extrusion parameters, establishing a direct mapping from visual input to native toolpaths. By producing structured G-code directly from 2D imagery, Image2Gcode eliminates the need for CAD or STL intermediates, lowering the entry barrier for additive manufacturing and accelerating the design-to-fabrication cycle. This approach supports on-demand prototyping from simple sketches or visual references and integrates with upstream 2D-to-3D reconstruction modules to enable an automated pipeline from concept to physical artifact. The result is a flexible, computationally efficient framework that advances accessibility in design iteration, repair workflows, and distributed manufacturing.

Paper Structure

This paper contains 15 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: Image2Gcode Overview. Our end-to-end framework generates printer-ready G-code toolpaths directly from visual inputs. (a) The system accepts object photographs and hand-drawn sketches. (b) Preprocessing extracts geometric boundaries from input images. (c) A Denoising Diffusion Probabilistic Model (DDPM) comprises (i) a pre-trained DinoV2 vision encoder that extracts multi-scale semantic features, (ii) a 1D U-Net decoder conditioned on these features via cross-attention that progressively denoises sequences to generate toolpaths, and (iii) Gaussian noise initialization during inference. (d) The predicted G-code defines continuous extrusion trajectories capturing geometry-specific infill patterns. (e) Physical parts fabricated via MEX.
  • Figure 2: Preprocessing pipeline. Visualization illustrating the extraction of slice-level training pairs. For each layer (left column), the corresponding G-code toolpath is visualized with complete trajectories (middle column) and extracted key points colored by normalized extrusion rate (right column). Point colors encode the normalized extrusion values $E \in [-1, 1]$, where darker values indicate higher material deposition rates. This representation captures both spatial trajectory information through $(X, Y)$ coordinates and material deposition characteristics through the extrusion channel $E$.
  • Figure 3: Model Architecture. The framework consists of a conditioning network and DDPM denoising network. The conditioning network processes input slice images through linear projection and a frozen DinoV2 transformer encoder to extract multi-scale visual features. The DDPM network uses a 1D U-Net with cross-attention mechanisms (QKV blocks) that fuse these visual features with trajectory sequences at multiple scales. The transformer block detail (right) shows the standard architecture with normalization, self-attention, and MLP layers connected via residual connections.
  • Figure 4: Generated samples. Qualitative results on validation samples from Slice-100K. For each geometry, the model generates structurally coherent toolpaths from input slice images. Generated toolpaths (third column) demonstrate accurate boundary reproduction and learned infill pattern selection, with physical prints (rightmost column) validating manufacturability and dimensional fidelity across diverse geometric primitives and infill strategies.
  • Figure 5: Generalization to real-world inputs: Photographs of physical objects (rows 1-2) and hand-drawn sketches (rows 3-4). The preprocessing module extracts boundary geometry from raw images, enabling the model to generate manufacturable toolpaths despite significant distribution shift from synthetic training data. Physical prints validate successful generalization across diverse input modalities.
  • ...and 2 more figures