Table of Contents
Fetching ...

RAVEL: Rare Concept Generation and Editing via Graph-driven Relational Guidance

Kavana Venkatesh, Yusuf Dalva, Ismini Lourentzou, Pinar Yanardag

TL;DR

RAVEL tackles the challenge of rare concept generation and culturally nuanced imagery by introducing a training-free, knowledge-graph–driven RAG framework that grounds prompts in structured relational context. It adds a Self-Correcting RAG-Guided Diffusion (SRD) loop that iteratively refines outputs using multi-aspect alignment and a decay-based prompt update scheme, improving attribute accuracy and narrative coherence. The approach is model-agnostic and evaluated on three new benchmarks—MythoBench, Rare-Concept-1K, and NovelBench—showing consistent gains in alignment, fidelity, and editing precision across SDXL, Flux, DALL-E 3, and ControlNet. These results demonstrate that structured KG grounding can robustly extend long-tail T2I capabilities without costly fine-tuning, with broad implications for controllable and interpretable generative imaging in diverse domains.

Abstract

Despite impressive visual fidelity, current text-to-image (T2I) diffusion models struggle to depict rare, complex, or culturally nuanced concepts due to training data limitations. We introduce RAVEL, a training-free framework that significantly improves rare concept generation, context-driven image editing, and self-correction by integrating graph-based retrieval-augmented generation (RAG) into diffusion pipelines. Unlike prior RAG and LLM-enhanced methods reliant on visual exemplars, static captions or pre-trained knowledge of models, RAVEL leverages structured knowledge graphs to retrieve compositional, symbolic, and relational context, enabling nuanced grounding even in the absence of visual priors. To further refine generation quality, we propose SRD, a novel self-correction module that iteratively updates prompts via multi-aspect alignment feedback, enhancing attribute accuracy, narrative coherence, and semantic fidelity. Our framework is model-agnostic and compatible with leading diffusion models including Stable Diffusion XL, Flux, and DALL-E 3. We conduct extensive evaluations across three newly proposed benchmarks - MythoBench, Rare-Concept-1K, and NovelBench. RAVEL also consistently outperforms SOTA methods across perceptual, alignment, and LLM-as-a-Judge metrics. These results position RAVEL as a robust paradigm for controllable and interpretable T2I generation in long-tail domains.

RAVEL: Rare Concept Generation and Editing via Graph-driven Relational Guidance

TL;DR

RAVEL tackles the challenge of rare concept generation and culturally nuanced imagery by introducing a training-free, knowledge-graph–driven RAG framework that grounds prompts in structured relational context. It adds a Self-Correcting RAG-Guided Diffusion (SRD) loop that iteratively refines outputs using multi-aspect alignment and a decay-based prompt update scheme, improving attribute accuracy and narrative coherence. The approach is model-agnostic and evaluated on three new benchmarks—MythoBench, Rare-Concept-1K, and NovelBench—showing consistent gains in alignment, fidelity, and editing precision across SDXL, Flux, DALL-E 3, and ControlNet. These results demonstrate that structured KG grounding can robustly extend long-tail T2I capabilities without costly fine-tuning, with broad implications for controllable and interpretable generative imaging in diverse domains.

Abstract

Despite impressive visual fidelity, current text-to-image (T2I) diffusion models struggle to depict rare, complex, or culturally nuanced concepts due to training data limitations. We introduce RAVEL, a training-free framework that significantly improves rare concept generation, context-driven image editing, and self-correction by integrating graph-based retrieval-augmented generation (RAG) into diffusion pipelines. Unlike prior RAG and LLM-enhanced methods reliant on visual exemplars, static captions or pre-trained knowledge of models, RAVEL leverages structured knowledge graphs to retrieve compositional, symbolic, and relational context, enabling nuanced grounding even in the absence of visual priors. To further refine generation quality, we propose SRD, a novel self-correction module that iteratively updates prompts via multi-aspect alignment feedback, enhancing attribute accuracy, narrative coherence, and semantic fidelity. Our framework is model-agnostic and compatible with leading diffusion models including Stable Diffusion XL, Flux, and DALL-E 3. We conduct extensive evaluations across three newly proposed benchmarks - MythoBench, Rare-Concept-1K, and NovelBench. RAVEL also consistently outperforms SOTA methods across perceptual, alignment, and LLM-as-a-Judge metrics. These results position RAVEL as a robust paradigm for controllable and interpretable T2I generation in long-tail domains.

Paper Structure

This paper contains 12 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: We introduce RAVEL, a training-free approach that uses graph-based RAG to enhance T2I models with context-aware guidance. It improves generation of rare, complex concepts and supports disentangled image editing. A self-correction module further refines visual and narrative accuracy.
  • Figure 2: RAVEL enhances image generation by integrating contextual details often overlooked by standard models for a variety of domains. Note that the reference images are shown solely for illustrative purposes and are not used by our framework.
  • Figure 3: Our method enhances disentangled editing by adding relationally accurate elements without explicit instructions, while ControlNet either adds generic objects or fails to make any edit.
  • Figure 4: Our self-correction mechanism ensures accurate depictions of concepts via iterative, context-aware prompt refinement.
  • Figure 5: Ablation study demonstrating how different retrieval and prompting strategies contribute to RAVEL's effectiveness in enhancing T2I models.