LLM-Oriented Retrieval Tuner
Si Sun, Hanqing Zhang, Zhiyuan Liu, Jie Bao, Dawei Song
TL;DR
The paper tackles the challenge of integrating dense retrieval with large language models without fine-tuning the LLM itself. It analyzes layer-wise alignment and uniformity in frozen LLMs and shows these properties reside in different layers, motivating a lightweight LMORT tuner that uses two bidirectional attention blocks to fuse the best alignment and uniformity layers into a unified DR space. By freezing the LLM and training only LMORT, the approach achieves competitive zero-shot BEIR performance across several base LLMs with significantly reduced parameters and training time, illustrating strong efficiency and scalability. This plugin-style method enables memory-augmented generation by coupling external retrieval with generation without compromising the LLM’s versatility across tasks.
Abstract
Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories. However, due to the paradigm discrepancy between text generation of LLM and DR, it is still an open challenge to integrate the retrieval and generation tasks in a shared LLM. In this paper, we propose an efficient LLM-Oriented Retrieval Tuner, namely LMORT, which decouples DR capacity from base LLM and non-invasively coordinates the optimally aligned and uniform layers of the LLM towards a unified DR space, achieving an efficient and effective DR without tuning the LLM itself. The extensive experiments on six BEIR datasets show that our approach could achieve competitive zero-shot retrieval performance compared to a range of strong DR models while maintaining the generation ability of LLM.
