Table of Contents
Fetching ...

CustAny: Customizing Anything from A Single Example

Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Chengming Xu, Donghao Luo, Mengtian Li, Jiangning Zhang, Chengjie Wang, Yanwei Fu

TL;DR

This paper proposes a novel pipeline to construct a large dataset of general objects and build the Multi-Category ID-Consistent (MC-IDC) dataset, and introduces Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects.

Abstract

Recent advances in diffusion-based text-to-image models have simplified creating high-fidelity images, but preserving the identity (ID) of specific elements, like a personal dog, is still challenging. Object customization, using reference images and textual descriptions, is key to addressing this issue. Current object customization methods are either object-specific, requiring extensive fine-tuning, or object-agnostic, offering zero-shot customization but limited to specialized domains. The primary issue of promoting zero-shot object customization from specific domains to the general domain is to establish a large-scale general ID dataset for model pre-training, which is time-consuming and labor-intensive. In this paper, we propose a novel pipeline to construct a large dataset of general objects and build the Multi-Category ID-Consistent (MC-IDC) dataset, featuring 315k text-image samples across 10k categories. With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects. CustAny features three key components: a general ID extraction module, a dual-level ID injection module, and an ID-aware decoupling module, allowing it to customize any object from a single reference image and text prompt. Experiments demonstrate that CustAny outperforms existing methods in both general object customization and specialized domains like human customization and virtual try-on. Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field. Code and dataset will be released soon in https://github.com/LingjieKong-fdu/CustAny.

CustAny: Customizing Anything from A Single Example

TL;DR

This paper proposes a novel pipeline to construct a large dataset of general objects and build the Multi-Category ID-Consistent (MC-IDC) dataset, and introduces Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects.

Abstract

Recent advances in diffusion-based text-to-image models have simplified creating high-fidelity images, but preserving the identity (ID) of specific elements, like a personal dog, is still challenging. Object customization, using reference images and textual descriptions, is key to addressing this issue. Current object customization methods are either object-specific, requiring extensive fine-tuning, or object-agnostic, offering zero-shot customization but limited to specialized domains. The primary issue of promoting zero-shot object customization from specific domains to the general domain is to establish a large-scale general ID dataset for model pre-training, which is time-consuming and labor-intensive. In this paper, we propose a novel pipeline to construct a large dataset of general objects and build the Multi-Category ID-Consistent (MC-IDC) dataset, featuring 315k text-image samples across 10k categories. With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects. CustAny features three key components: a general ID extraction module, a dual-level ID injection module, and an ID-aware decoupling module, allowing it to customize any object from a single reference image and text prompt. Experiments demonstrate that CustAny outperforms existing methods in both general object customization and specialized domains like human customization and virtual try-on. Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field. Code and dataset will be released soon in https://github.com/LingjieKong-fdu/CustAny.
Paper Structure (14 sections, 11 equations, 12 figures, 8 tables)

This paper contains 14 sections, 11 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Customizing any object from a single example (text prompt and the reference image). Our CustAny can achieve various customization for general objects with high ID fidelity and flexible text edit-ability, without further fine-tuning.
  • Figure 2: Illustration of our general ID dataset, MC-IDC. In each sample, the reference image with the object mask provides ID information, the text prompt offers semantic-level guidance for generation, and the target image serves as the ground truth.
  • Figure 3: Overview of our CustAny. CustAny is a zero-shot text-to-image customization method for general objects, consisting of general ID extractor, global-local dual-level ID injection, and ID-aware decoupling module.
  • Figure 4: Qualitative results on general domains and two specific specific domains: human customization and virtual try-on. CustAny exhibits great ID-preserving ability with better text controls and more diverse generations on both general objects and specialized domains.
  • Figure 5: Results: the same reference in different text prompts.
  • ...and 7 more figures