RAVEL: Rare Concept Generation and Editing via Graph-driven Relational Guidance
Kavana Venkatesh, Yusuf Dalva, Ismini Lourentzou, Pinar Yanardag
TL;DR
RAVEL tackles the challenge of rare concept generation and culturally nuanced imagery by introducing a training-free, knowledge-graph–driven RAG framework that grounds prompts in structured relational context. It adds a Self-Correcting RAG-Guided Diffusion (SRD) loop that iteratively refines outputs using multi-aspect alignment and a decay-based prompt update scheme, improving attribute accuracy and narrative coherence. The approach is model-agnostic and evaluated on three new benchmarks—MythoBench, Rare-Concept-1K, and NovelBench—showing consistent gains in alignment, fidelity, and editing precision across SDXL, Flux, DALL-E 3, and ControlNet. These results demonstrate that structured KG grounding can robustly extend long-tail T2I capabilities without costly fine-tuning, with broad implications for controllable and interpretable generative imaging in diverse domains.
Abstract
Despite impressive visual fidelity, current text-to-image (T2I) diffusion models struggle to depict rare, complex, or culturally nuanced concepts due to training data limitations. We introduce RAVEL, a training-free framework that significantly improves rare concept generation, context-driven image editing, and self-correction by integrating graph-based retrieval-augmented generation (RAG) into diffusion pipelines. Unlike prior RAG and LLM-enhanced methods reliant on visual exemplars, static captions or pre-trained knowledge of models, RAVEL leverages structured knowledge graphs to retrieve compositional, symbolic, and relational context, enabling nuanced grounding even in the absence of visual priors. To further refine generation quality, we propose SRD, a novel self-correction module that iteratively updates prompts via multi-aspect alignment feedback, enhancing attribute accuracy, narrative coherence, and semantic fidelity. Our framework is model-agnostic and compatible with leading diffusion models including Stable Diffusion XL, Flux, and DALL-E 3. We conduct extensive evaluations across three newly proposed benchmarks - MythoBench, Rare-Concept-1K, and NovelBench. RAVEL also consistently outperforms SOTA methods across perceptual, alignment, and LLM-as-a-Judge metrics. These results position RAVEL as a robust paradigm for controllable and interpretable T2I generation in long-tail domains.
