ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Rotem Shalev-Arkushin; Rinon Gal; Amit H. Bermano; Ohad Fried

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Rotem Shalev-Arkushin, Rinon Gal, Amit H. Bermano, Ohad Fried

TL;DR

Diffusion-based text-to-image models struggle with rare or unseen concepts. ImageRAG introduces dynamic retrieval of reference images to provide guidance during sampling without requiring RAG-specific training, and it works with multiple base models and prompting controls. A vision-language model identifies missing concepts, generates retrieval captions, and images are retrieved via CLIP-based similarity from a large dataset to augment prompts. Across OmniGen and SDXL, ImageRAG improves rare-concept generation and receives favorable qualitative feedback, showing practical, model-agnostic benefits for reference-guided image synthesis.

Abstract

Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

TL;DR

Abstract

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)