Table of Contents
Fetching ...

LLM-Oriented Retrieval Tuner

Si Sun, Hanqing Zhang, Zhiyuan Liu, Jie Bao, Dawei Song

TL;DR

The paper tackles the challenge of integrating dense retrieval with large language models without fine-tuning the LLM itself. It analyzes layer-wise alignment and uniformity in frozen LLMs and shows these properties reside in different layers, motivating a lightweight LMORT tuner that uses two bidirectional attention blocks to fuse the best alignment and uniformity layers into a unified DR space. By freezing the LLM and training only LMORT, the approach achieves competitive zero-shot BEIR performance across several base LLMs with significantly reduced parameters and training time, illustrating strong efficiency and scalability. This plugin-style method enables memory-augmented generation by coupling external retrieval with generation without compromising the LLM’s versatility across tasks.

Abstract

Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories. However, due to the paradigm discrepancy between text generation of LLM and DR, it is still an open challenge to integrate the retrieval and generation tasks in a shared LLM. In this paper, we propose an efficient LLM-Oriented Retrieval Tuner, namely LMORT, which decouples DR capacity from base LLM and non-invasively coordinates the optimally aligned and uniform layers of the LLM towards a unified DR space, achieving an efficient and effective DR without tuning the LLM itself. The extensive experiments on six BEIR datasets show that our approach could achieve competitive zero-shot retrieval performance compared to a range of strong DR models while maintaining the generation ability of LLM.

LLM-Oriented Retrieval Tuner

TL;DR

The paper tackles the challenge of integrating dense retrieval with large language models without fine-tuning the LLM itself. It analyzes layer-wise alignment and uniformity in frozen LLMs and shows these properties reside in different layers, motivating a lightweight LMORT tuner that uses two bidirectional attention blocks to fuse the best alignment and uniformity layers into a unified DR space. By freezing the LLM and training only LMORT, the approach achieves competitive zero-shot BEIR performance across several base LLMs with significantly reduced parameters and training time, illustrating strong efficiency and scalability. This plugin-style method enables memory-augmented generation by coupling external retrieval with generation without compromising the LLM’s versatility across tasks.

Abstract

Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories. However, due to the paradigm discrepancy between text generation of LLM and DR, it is still an open challenge to integrate the retrieval and generation tasks in a shared LLM. In this paper, we propose an efficient LLM-Oriented Retrieval Tuner, namely LMORT, which decouples DR capacity from base LLM and non-invasively coordinates the optimally aligned and uniform layers of the LLM towards a unified DR space, achieving an efficient and effective DR without tuning the LLM itself. The extensive experiments on six BEIR datasets show that our approach could achieve competitive zero-shot retrieval performance compared to a range of strong DR models while maintaining the generation ability of LLM.
Paper Structure (18 sections, 8 equations, 8 figures, 5 tables)

This paper contains 18 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Layer-wise alignment and uniformity analysis in GPT-j-6B. The redder the color, the better the alignment and uniformity. Conversely, the bluer the color, the worse alignment and uniformity. The X-axis denotes six BEIR datasets and their average results. The Y-axis represents the layer number of GPT-j-6B (e.g., #1 is the first embedding layer and #29 is the last hidden layer).
  • Figure 2: Illustration of LLM-Oriented Retrieval Tuner (LMORT). The total layer number of LMORT is much less than that of the frozen LLM ($M \ll N$).
  • Figure 3: The average results of layer-wise alignment and uniformity estimation on six BEIR datasets. The redder the color, the better the alignment and uniformity. Conversely, the bluer the color, the worse alignment and uniformity. The Y-axis represents the layer number of three GPTs.
  • Figure 4: The average NDCG@10 results of LMORT with different layer number on three LLMs (GPT2-Large, GPT2-XL, GPT-j-6B). The X-axis means the total layer number of LMORT. The Y-axis denotes the average NDCG@10 scores of six BEIR datasets.
  • Figure 5: The alignment and uniformity analysis of LMORT (GPT-j-6B) on six BEIR datasets. #O means the output layer of LMORT. #A and #U denotes the optimal alignment and uniformity layer of the LLM, respectively. The minimum the loss, the better the alignment and uniformity. Conversely, the maximum the loss, the worse alignment and uniformity.
  • ...and 3 more figures