Table of Contents
Fetching ...

NGL-Prompter: Training-Free Sewing Pattern Estimation from a Single Image

Anna Badalyan, Pratheba Selvaraju, Giorgio Becherini, Omid Taheri, Victoria Fernandez Abrevaya, Michael Black

TL;DR

NGL (Natural Garment Language), a novel intermediate language that restructures GarmentCode into a representation more understandable to language models is proposed, and NGL-Prompter, a training-free pipeline that queries large VLMs to extract structured garment parameters, which are then deterministically mapped to valid GarmentCode are introduced.

Abstract

Estimating sewing patterns from images is a practical approach for creating high-quality 3D garments. Due to the lack of real-world pattern-image paired data, prior approaches fine-tune large vision language models (VLMs) on synthetic garment datasets generated by randomly sampling from a parametric garment model GarmentCode. However, these methods often struggle to generalize to in-the-wild images, fail to capture real-world correlations between garment parts, and are typically restricted to single-layer outfits. In contrast, we observe that VLMs are effective at describing garments in natural language, yet perform poorly when asked to directly regress GarmentCode parameters from images. To bridge this gap, we propose NGL (Natural Garment Language), a novel intermediate language that restructures GarmentCode into a representation more understandable to language models. Leveraging this language, we introduce NGL-Prompter, a training-free pipeline that queries large VLMs to extract structured garment parameters, which are then deterministically mapped to valid GarmentCode. We evaluate our method on the Dress4D, CloSe and a newly collected dataset of approximately 5,000 in-the-wild fashion images. Our approach achieves state-of-the-art performance on standard geometry metrics and is strongly preferred in both human and GPT-based perceptual evaluations compared to existing baselines. Furthermore, NGL-prompter can recover multi-layer outfits whereas competing methods focus mostly on single-layer garments, highlighting its strong generalization to real-world images even with occluded parts. These results demonstrate that accurate sewing pattern reconstruction is possible without costly model training. Our code and data will be released for research use.

NGL-Prompter: Training-Free Sewing Pattern Estimation from a Single Image

TL;DR

NGL (Natural Garment Language), a novel intermediate language that restructures GarmentCode into a representation more understandable to language models is proposed, and NGL-Prompter, a training-free pipeline that queries large VLMs to extract structured garment parameters, which are then deterministically mapped to valid GarmentCode are introduced.

Abstract

Estimating sewing patterns from images is a practical approach for creating high-quality 3D garments. Due to the lack of real-world pattern-image paired data, prior approaches fine-tune large vision language models (VLMs) on synthetic garment datasets generated by randomly sampling from a parametric garment model GarmentCode. However, these methods often struggle to generalize to in-the-wild images, fail to capture real-world correlations between garment parts, and are typically restricted to single-layer outfits. In contrast, we observe that VLMs are effective at describing garments in natural language, yet perform poorly when asked to directly regress GarmentCode parameters from images. To bridge this gap, we propose NGL (Natural Garment Language), a novel intermediate language that restructures GarmentCode into a representation more understandable to language models. Leveraging this language, we introduce NGL-Prompter, a training-free pipeline that queries large VLMs to extract structured garment parameters, which are then deterministically mapped to valid GarmentCode. We evaluate our method on the Dress4D, CloSe and a newly collected dataset of approximately 5,000 in-the-wild fashion images. Our approach achieves state-of-the-art performance on standard geometry metrics and is strongly preferred in both human and GPT-based perceptual evaluations compared to existing baselines. Furthermore, NGL-prompter can recover multi-layer outfits whereas competing methods focus mostly on single-layer garments, highlighting its strong generalization to real-world images even with occluded parts. These results demonstrate that accurate sewing pattern reconstruction is possible without costly model training. Our code and data will be released for research use.
Paper Structure (17 sections, 7 figures, 6 tables)

This paper contains 17 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: 3D garment reconstruction by NGL-Prompter. Given an image of a clothed person, our method estimates sewing patterns in a training-free manner, handling both single and multi-layer outfits. The method also seamlessly supports text input (right).
  • Figure 2: Overview of NGL-Prompter and rendering pipeline. Given a single image containing a single- or multi-layer outfit, NGL-Prompter first prompts a VLM to identify garment types, then applies a sequence of rule-based, dependency-aware prompts, where each step conditions on the VLM’s previous outputs, until all required attributes are resolved. The selected attributes are then compiled into a structured JSON output, which is further converted by the parser into GarmentCode parameters. The top row depicts our NGL-Prompter system. The remaining blocks illustrate our textured mesh generation and rendering pipeline: we recover the 3D human pose and extract garment texture using off-the-shelf methods (TokenHMR and FabricDiffusion). The predicted GarmentCode parameters are passed to GarmentCode to generate 2D sewing patterns, which are assembled into 3D garments. Finally, the garment mesh, extracted body pose, and texture are provided to a cloth simulation package (e.g., CLO3D or ContourCraft) to obtain a draped reconstruction.
  • Figure 3: Natural Garment Language (NGL). For the given input image, we show the reconstructed garment rendered with our pipeline together with the inferred NGL parameter–value pairs.
  • Figure 4: Empirical results on VLMs knowledge about garments. The plot shows the F1 score computed on our ASOS_labeled dataset (Ref. \ref{['ssec:asos_dataset']}) accross various model sizes and selected set of parameters from NGL. All models can confidently identify intricate garment details that are commonly described on fashion websites (e.g. straight or heart-shaped strapless neckline), but struggle with details that are not commonly described (e.g. skirt with back longer than the front or one side londer than the other)
  • Figure 5: Quantitative results on Garment Attribute Accuracy across NGL LODs.. We report the F1 score on our ASOS_labeled dataset (Ref. \ref{['ssec:asos_dataset']}) to evaluate the prediction accuracy for different design details across model sizes. NGL-0 $\cap$ NGL-1 denotes the subset of attributes shared by both the LODs. Overall, NGL-0 performs best, suggesting that current VLMs still require additional cues to reliably capture finer-grained details at higher LODs
  • ...and 2 more figures