Table of Contents
Fetching ...

SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation

Phuc Pham, Uy Dieu Tran, Binh-Son Hua, Phong Nguyen

Abstract

Realistic and efficient 3D garment generation remains a longstanding challenge in computer vision and digital fashion. Existing methods typically rely on large vision- language models to produce serialized representations of 2D sewing patterns, which are then transformed into simulation-ready 3D meshes using garment modeling framework such as GarmentCode. Although these approaches yield high-quality results, they often suffer from slow inference times, ranging from 30 seconds to a minute. In this work, we introduce SwiftTailor, a novel two-stage framework that unifies sewing-pattern reasoning and geometry-based mesh synthesis through a compact geometry image representation. SwiftTailor comprises two lightweight modules: PatternMaker, an efficient vision-language model that predicts sewing patterns from diverse input modalities, and GarmentSewer, an efficient dense prediction transformer that converts these patterns into a novel Garment Geometry Image, encoding the 3D surface of all garment panels in a unified UV space. The final 3D mesh is reconstructed through an efficient inverse mapping process that incorporates remeshing and dynamic stitching algorithms to directly assemble the garment, thereby amortizing the cost of physical simulation. Extensive experiments on the Multimodal GarmentCodeData demonstrate that SwiftTailor achieves state-of-the-art accuracy and visual fidelity while significantly reducing inference time. This work offers a scalable, interpretable, and high-performance solution for next-generation 3D garment generation.

SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation

Abstract

Realistic and efficient 3D garment generation remains a longstanding challenge in computer vision and digital fashion. Existing methods typically rely on large vision- language models to produce serialized representations of 2D sewing patterns, which are then transformed into simulation-ready 3D meshes using garment modeling framework such as GarmentCode. Although these approaches yield high-quality results, they often suffer from slow inference times, ranging from 30 seconds to a minute. In this work, we introduce SwiftTailor, a novel two-stage framework that unifies sewing-pattern reasoning and geometry-based mesh synthesis through a compact geometry image representation. SwiftTailor comprises two lightweight modules: PatternMaker, an efficient vision-language model that predicts sewing patterns from diverse input modalities, and GarmentSewer, an efficient dense prediction transformer that converts these patterns into a novel Garment Geometry Image, encoding the 3D surface of all garment panels in a unified UV space. The final 3D mesh is reconstructed through an efficient inverse mapping process that incorporates remeshing and dynamic stitching algorithms to directly assemble the garment, thereby amortizing the cost of physical simulation. Extensive experiments on the Multimodal GarmentCodeData demonstrate that SwiftTailor achieves state-of-the-art accuracy and visual fidelity while significantly reducing inference time. This work offers a scalable, interpretable, and high-performance solution for next-generation 3D garment generation.
Paper Structure (40 sections, 1 equation, 12 figures, 8 tables, 6 algorithms)

This paper contains 40 sections, 1 equation, 12 figures, 8 tables, 6 algorithms.

Figures (12)

  • Figure 1: We introduce SwiftTailor, a two-stage framework including PatternMaker and GarmentSewer that aims to produce sewing patterns along with a novel garment geometry image representation that can be directly decoded to final 3D garment meshes.
  • Figure 2: Preliminaries on geometry images gu2002geometryimagesander2003multi, an image-based 3D representation that parameterizes a 3D mesh into charts, each being stored as simple arrays of pixels. Our work integrates geometry images with semantic and stitching information to establish garment panels, yielding a novel garment geometry image representation suitable for 3D garment generation.
  • Figure 3: Overall pipeline. Our PatternMaker is a relatively small vision-language model (InternVL-3-2B wang2025internvl3) trained to output sewing patterns. The sewing patterns are constructed from discrete tokens and continuous parameters predicted by the VLM. Our GarmentSewer is a dense prediction transformer (DPT) that predicts a garment geometry image from the sewing patterns. In this step, we preprocess the sewing pattern to achieve the semantic and stitching map, which are then passed to the DPT to predict the geometry image, completing our garment geometry image representation (GGI). We then perform a postprocessing step to convert the GGI to a final 3D mesh.
  • Figure 4: (Left) We present how to prepare the three components (geometry, semantic and stiching) of our propose Garment Geometry Image (GGI); (Right) From the estimated geometry and stiching images of GarmentSewer and PatternMaker, two additional remeshing and stiching steps are performed to obtain the final 3D mesh result.
  • Figure 5: Qualitative comparisons between SwiftTailor and recent state-of-the-art methods on 3D garment modeling aipparelchatgarmentsewingldm using an image, a text prompt, and both text and image as input, respectively.
  • ...and 7 more figures