CharCom: Composable Identity Control for Multi-Character Story Illustration
Zhongsheng Wang, Ming Lin, Zhedong Lin, Yaser Shakib, Qian Liu, Jiamou Liu
TL;DR
CharCom tackles the challenge of maintaining character identity across multi-scene diffusion-generated stories by introducing per-character LoRA adapters that are trained independently on a frozen backbone and composed at inference. A structured, consistency-focused prompting scheme and a prompt-aware adapter fusion mechanism enable scalable, multi-character generation with minimal retraining overhead. Experimental results on a synthetic Arabic storytelling benchmark and human-guided evaluations show substantial gains in identity fidelity, prompt alignment, and temporal coherence over strong baselines, with robust performance as narrative complexity grows. The approach offers a practical, modular pathway toward real-world story illustration and animation, while identifying limitations in disambiguation and attention balance that point to future improvements in spatial reasoning and attention mechanisms.
Abstract
Ensuring character identity consistency across varying prompts remains a fundamental limitation in diffusion-based text-to-image generation. We propose CharCom, a modular and parameter-efficient framework that achieves character-consistent story illustration through composable LoRA adapters, enabling efficient per-character customization without retraining the base model. Built on a frozen diffusion backbone, CharCom dynamically composes adapters at inference using prompt-aware control. Experiments on multi-scene narratives demonstrate that CharCom significantly enhances character fidelity, semantic alignment, and temporal coherence. It remains robust in crowded scenes and enables scalable multi-character generation with minimal overhead, making it well-suited for real-world applications such as story illustration and animation.
