Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

Yu Li; Xingyu Qiu; Yuqian Fu; Jie Chen; Tianwen Qian; Xu Zheng; Danda Pani Paudel; Yanwei Fu; Xuanjing Huang; Luc Van Gool; Yu-Gang Jiang

Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

Yu Li, Xingyu Qiu, Yuqian Fu, Jie Chen, Tianwen Qian, Xu Zheng, Danda Pani Paudel, Yanwei Fu, Xuanjing Huang, Luc Van Gool, Yu-Gang Jiang

TL;DR

This paper addresses cross-domain few-shot object detection (CD-FSOD) by introducing Domain-RAG, a training-free, retrieval-guided compositional image generation framework that fixes the foreground and adapts the background. It uses a three-stage pipeline—domain-aware background retrieval, domain-guided background generation, and foreground-background composition—to synthesize domain-aligned training samples without retraining detectors. The method leverages retrieval priors from COCO, Redux and Flux diffusion tools to produce high-quality, domain-consistent backgrounds and seamlessly composites them with preserved foregrounds. Domain-RAG achieves state-of-the-art results across CD-FSOD, RS-FSOD, and Camouflaged FSOD, demonstrating robust improvements in low-shot, cross-domain settings and providing a practical, plug-and-play augmentation for detection models.

Abstract

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel objects with only a handful of labeled samples from previously unseen domains. While data augmentation and generative methods have shown promise in few-shot learning, their effectiveness for CD-FSOD remains unclear due to the need for both visual realism and domain alignment. Existing strategies, such as copy-paste augmentation and text-to-image generation, often fail to preserve the correct object category or produce backgrounds coherent with the target domain, making them non-trivial to apply directly to CD-FSOD. To address these challenges, we propose Domain-RAG, a training-free, retrieval-guided compositional image generation framework tailored for CD-FSOD. Domain-RAG consists of three stages: domain-aware background retrieval, domain-guided background generation, and foreground-background composition. Specifically, the input image is first decomposed into foreground and background regions. We then retrieve semantically and stylistically similar images to guide a generative model in synthesizing a new background, conditioned on both the original and retrieved contexts. Finally, the preserved foreground is composed with the newly generated domain-aligned background to form the generated image. Without requiring any additional supervision or training, Domain-RAG produces high-quality, domain-consistent samples across diverse tasks, including CD-FSOD, remote sensing FSOD, and camouflaged FSOD. Extensive experiments show consistent improvements over strong baselines and establish new state-of-the-art results. Codes will be released upon acceptance.

Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

TL;DR

Abstract

Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)