Table of Contents
Fetching ...

Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu

TL;DR

Character-Adapter addresses the challenge of maintaining high-fidelity character details in image synthesis by introducing a plug-and-play framework that uses prompt-guided segmentation to localize character regions and dynamic region-level adapters to preserve region-specific features. By leveraging cross-attention cues within diffusion models and applying region-specific fusion, it enables both single- and multi-character generation without additional training. The approach achieves state-of-the-art zero-shot character consistency—demonstrated by substantial improvements in CLIP-I and DINO-I metrics—while maintaining strong text–image alignment and computational efficiency. This method enhances practical applicability for storytelling, portrait design, and character-centric editing by offering flexible, region-aware control and broad compatibility with existing editing tools.

Abstract

Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. Therefore, we propose Character-Adapter, a plug-and-play framework designed to generate images that preserve the details of reference characters, ensuring high-fidelity consistency. Character-Adapter employs prompt-guided segmentation to ensure fine-grained regional features of reference characters and dynamic region-level adapters to mitigate concept confusion. Extensive experiments are conducted to validate the effectiveness of Character-Adapter. Both quantitative and qualitative results demonstrate that Character-Adapter achieves the state-of-the-art performance of consistent character generation, with an improvement of 24.8% compared with other methods. Our code will be released at https://github.com/Character-Adapter/Character-Adapter.

Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

TL;DR

Character-Adapter addresses the challenge of maintaining high-fidelity character details in image synthesis by introducing a plug-and-play framework that uses prompt-guided segmentation to localize character regions and dynamic region-level adapters to preserve region-specific features. By leveraging cross-attention cues within diffusion models and applying region-specific fusion, it enables both single- and multi-character generation without additional training. The approach achieves state-of-the-art zero-shot character consistency—demonstrated by substantial improvements in CLIP-I and DINO-I metrics—while maintaining strong text–image alignment and computational efficiency. This method enhances practical applicability for storytelling, portrait design, and character-centric editing by offering flexible, region-aware control and broad compatibility with existing editing tools.

Abstract

Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. Therefore, we propose Character-Adapter, a plug-and-play framework designed to generate images that preserve the details of reference characters, ensuring high-fidelity consistency. Character-Adapter employs prompt-guided segmentation to ensure fine-grained regional features of reference characters and dynamic region-level adapters to mitigate concept confusion. Extensive experiments are conducted to validate the effectiveness of Character-Adapter. Both quantitative and qualitative results demonstrate that Character-Adapter achieves the state-of-the-art performance of consistent character generation, with an improvement of 24.8% compared with other methods. Our code will be released at https://github.com/Character-Adapter/Character-Adapter.
Paper Structure (27 sections, 16 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 27 sections, 16 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Images generated by Character-Adapter. Character-Adapter can be seamlessly integrated with any preferred model, without extra training. This approach empowers the customization of concepts while preserving the high-fidelity appearance of given characters (without any quantitative limitations), encompassing attributes such as hairstyle, identity, attire, and others.
  • Figure 2: Framework of Character-Adapter. Step 1 involves obtaining the segmentation of the reference characters with given images and prompts through the prompt-guided segmentation module (Module a). Step 2 acquires attention maps of layout images generated solely from the given prompts via the same module. Step 3 illustrates the process of generating images with the given prompt and semantic regions through the dynamic region-level adapters module (Module b).
  • Figure 3: Visual comparison (%) of Character-Adapter against other subject-driven methods. Our approach ensures high-fidelity consistency, while maintaining text-image alignment.
  • Figure 4: Visualization of ablation study, each component is removed individually to prove its efficiency, (d) represents the results obtained with the whole Character-Adapter.
  • Figure 5: Visualization of Character-Adapter's versatility and compatibility. (a) Combination with Pose Control. (b) Inpainting with a reference image. (c) Generation with animals (other types).
  • ...and 5 more figures