ReFIR: Grounding Large Restoration Models with Retrieval Augmentation
Hang Guo, Tao Dai, Zhihao Ouyang, Taolin Zhang, Yaohua Zha, Bin Chen, Shu-tao Xia
TL;DR
This paper tackles hallucination in diffusion-based large restoration models (LRMs) by introducing ReFIR, a training-free Retrieval-Augmented framework that leverages retrieved high-quality reference images. It couples a nearest-neighbor reference retriever with a cross-image injection mechanism that uses separate attention, spatial adaptive gating, and distribution alignment to transfer textures from references into the detail-texture restoration stage of LRMs. The approach is model-agnostic and shows consistent improvements in both fidelity and perceptual quality across datasets without retraining, validating the effectiveness of external knowledge grounding in image restoration. The practical impact is significant: ReFIR can be plugged into existing LRMs to reduce hallucinations while maintaining efficiency and generality, enabling higher-fidelity restoration in challenging real-world degradations.
Abstract
Recent advances in diffusion-based Large Restoration Models (LRMs) have significantly improved photo-realistic image restoration by leveraging the internal knowledge embedded within model weights. However, existing LRMs often suffer from the hallucination dilemma, i.e., producing incorrect contents or textures when dealing with severe degradations, due to their heavy reliance on limited internal knowledge. In this paper, we propose an orthogonal solution called the Retrieval-augmented Framework for Image Restoration (ReFIR), which incorporates retrieved images as external knowledge to extend the knowledge boundary of existing LRMs in generating details faithful to the original scene. Specifically, we first introduce the nearest neighbor lookup to retrieve content-relevant high-quality images as reference, after which we propose the cross-image injection to modify existing LRMs to utilize high-quality textures from retrieved images. Thanks to the additional external knowledge, our ReFIR can well handle the hallucination challenge and facilitate faithfully results. Extensive experiments demonstrate that ReFIR can achieve not only high-fidelity but also realistic restoration results. Importantly, our ReFIR requires no training and is adaptable to various LRMs.
