IC-Custom: Diverse Image Customization via In-Context Learning

Yaowei Li; Xiaoyu Li; Zhaoyang Zhang; Yuxuan Bian; Gan Liu; Xinyuan Li; Jiale Xu; Wenbo Hu; Yating Liu; Lingen Li; Jing Cai; Yuexian Zou; Yancheng He; Ying Shan

IC-Custom: Diverse Image Customization via In-Context Learning

Yaowei Li, Xiaoyu Li, Zhaoyang Zhang, Yuxuan Bian, Gan Liu, Xinyuan Li, Jiale Xu, Wenbo Hu, Yating Liu, Lingen Li, Jing Cai, Yuexian Zou, Yancheng He, Ying Shan

TL;DR

IC-Custom addresses the challenge of unified image customization by bridging position-aware and position-free tasks under a single in-context framework. It introduces In-Context Multi-Modal Attention (ICMA) with learnable task tokens and boundary-aware embeddings, and an in-context diptych input representation built on the DiT architecture. To train effectively, it constructs CustomData (12K identity-consistent diptychs) and evaluates on ProductBench and DreamBench, achieving substantial gains in identity consistency, harmony, and text alignment with only ~0.4% of parameters updated. The results indicate strong practical potential for industrial media workflows, enabling flexible, identity-preserving edits across diverse scenarios.

Abstract

Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios. To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. IC-Custom concatenates reference images with target images to a polyptych, leveraging DiT's multi-modal attention mechanism for fine-grained token-level interactions. We propose the In-context Multi-Modal Attention (ICMA) mechanism, which employs learnable task-oriented register tokens and boundary-aware positional embeddings to enable the model to effectively handle diverse tasks and distinguish between inputs in polyptych configurations. To address the data gap, we curated a 12K identity-consistent dataset with 8K real-world and 4K high-quality synthetic samples, avoiding the overly glossy, oversaturated look typical of synthetic data. IC-Custom supports various industrial applications, including try-on, image insertion, and creative IP customization. Extensive evaluations on our proposed ProductBench and the publicly available DreamBench demonstrate that IC-Custom significantly outperforms community workflows, closed-source models, and state-of-the-art open-source approaches. IC-Custom achieves about 73\% higher human preference across identity consistency, harmony, and text alignment metrics, while training only 0.4\% of the original model parameters. Project page: https://liyaowei-stu.github.io/project/IC_Custom

IC-Custom: Diverse Image Customization via In-Context Learning

TL;DR

Abstract

IC-Custom: Diverse Image Customization via In-Context Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)