ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models
Kiymet Akdemir, Pinar Yanardag
TL;DR
The paper tackles the problem of maintaining consistent character representations across contexts in text-to-image diffusion. It introduces ORACLE, a three-stage pipeline that first generates a grid of candidate characters from a single prompt, next refines this set via mutual-information-based outlier filtering, and finally personalizes a LoRA model on the refined set to enable cross-context generation. Empirical results—qualitative, quantitative CLIP-based metrics, and a user study—show that ORACLE achieves a favorable balance between faithfully following prompts and preserving character identity, outperforming baselines such as The Chosen One, IP-Adapter, and LoRA-DB. This approach enables rapid, cohesive character design for comics, games, education, and related creative workflows by reducing manual curation and enabling consistent visualization across scenes and media.
Abstract
Text-to-image diffusion models have recently taken center stage as pivotal tools in promoting visual creativity across an array of domains such as comic book artistry, children's literature, game development, and web design. These models harness the power of artificial intelligence to convert textual descriptions into vivid images, thereby enabling artists and creators to bring their imaginative concepts to life with unprecedented ease. However, one of the significant hurdles that persist is the challenge of maintaining consistency in character generation across diverse contexts. Variations in textual prompts, even if minor, can yield vastly different visual outputs, posing a considerable problem in projects that require a uniform representation of characters throughout. In this paper, we introduce a novel framework designed to produce consistent character representations from a single text prompt across diverse settings. Through both quantitative and qualitative analyses, we demonstrate that our framework outperforms existing methods in generating characters with consistent visual identities, underscoring its potential to transform creative industries. By addressing the critical challenge of character consistency, we not only enhance the practical utility of these models but also broaden the horizons for artistic and creative expression.
