AIpparel: A Multimodal Foundation Model for Digital Garments

Kiyohiro Nakayama; Jan Ackermann; Timur Levent Kesdogan; Yang Zheng; Maria Korosteleva; Olga Sorkine-Hornung; Leonidas J. Guibas; Guandao Yang; Gordon Wetzstein

AIpparel: A Multimodal Foundation Model for Digital Garments

Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan, Yang Zheng, Maria Korosteleva, Olga Sorkine-Hornung, Leonidas J. Guibas, Guandao Yang, Gordon Wetzstein

TL;DR

AIpparel introduces a multimodal foundation model for digital garments by fine-tuning a large multimodal model on the GCD-MM dataset and employing a compact sewing-pattern tokenizer to encode complex patchwork geometry. The approach enables accurate image-to-pattern prediction, text-conditioned generation, and language-driven editing, outperforming state-of-the-art single-modal baselines and enabling novel multimodal garment workflows. Key contributions include the GarmentCodeData-MultiModal dataset, a lightweight yet expressive tokenization scheme, and regression heads for continuous pattern parameters, all culminating in simulation-ready sewing patterns. This work advances AI-assisted fashion design by translating web-scale vision-language knowledge into actionable garment generation and editing, with potential impacts in design efficiency and fabrication while acknowledging dataset biases and societal considerations.

Abstract

Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and enables novel multimodal garment generation applications such as interactive garment editing. The project website is at https://georgenakayama.github.io/AIpparel/.

AIpparel: A Multimodal Foundation Model for Digital Garments

TL;DR

Abstract

AIpparel: A Multimodal Foundation Model for Digital Garments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)