Table of Contents
Fetching ...

BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain

Kaisi Guan, Qian Cao, Yuchong Sun, Xiting Wang, Ruihua Song

TL;DR

A novel Backbone Shared RAG framework that first uses a domain-specific corpus to continually pre-train a base model as a domain-specific backbone model and then trains two plug-and-play Low-Rank Adaptation modules based on the shared backbone to minimize retrieval and generation losses respectively.

Abstract

Retrieval Augmented Generation (RAG) system is important in domains such as e-commerce, which has many long-tail entities and frequently updated information. Most existing works adopt separate modules for retrieval and generation, which may be suboptimal since the retrieval task and the generation task cannot benefit from each other to improve performance. We propose a novel Backbone Shared RAG framework (BSharedRAG). It first uses a domain-specific corpus to continually pre-train a base model as a domain-specific backbone model and then trains two plug-and-play Low-Rank Adaptation (LoRA) modules based on the shared backbone to minimize retrieval and generation losses respectively. Experimental results indicate that our proposed BSharedRAG outperforms baseline models by 5% and 13% in Hit@3 upon two datasets in retrieval evaluation and by 23% in terms of BLEU-3 in generation evaluation. Our codes, models, and dataset are available at https://bsharedrag.github.io.

BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain

TL;DR

A novel Backbone Shared RAG framework that first uses a domain-specific corpus to continually pre-train a base model as a domain-specific backbone model and then trains two plug-and-play Low-Rank Adaptation modules based on the shared backbone to minimize retrieval and generation losses respectively.

Abstract

Retrieval Augmented Generation (RAG) system is important in domains such as e-commerce, which has many long-tail entities and frequently updated information. Most existing works adopt separate modules for retrieval and generation, which may be suboptimal since the retrieval task and the generation task cannot benefit from each other to improve performance. We propose a novel Backbone Shared RAG framework (BSharedRAG). It first uses a domain-specific corpus to continually pre-train a base model as a domain-specific backbone model and then trains two plug-and-play Low-Rank Adaptation (LoRA) modules based on the shared backbone to minimize retrieval and generation losses respectively. Experimental results indicate that our proposed BSharedRAG outperforms baseline models by 5% and 13% in Hit@3 upon two datasets in retrieval evaluation and by 23% in terms of BLEU-3 in generation evaluation. Our codes, models, and dataset are available at https://bsharedrag.github.io.
Paper Structure (40 sections, 6 equations, 6 figures, 7 tables)

This paper contains 40 sections, 6 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Comparing three categories of possible RAG frameworks: (a) most previous works lie in the separate RAG category, in which the retrieval task and the generation task cannot benefit from each other; (b) only a few trials are about the fully shared RAG, which may suffer from performance decrease due to negative transfer and require effort-taking loss balancing to determine $\lambda$; (c) what we proposed is a backbone shared RAG, which ensures effective knowledge transfer between the two tasks without the need to perform effort-taking loss balancing.
  • Figure 2: Overview of training and inference of our proposed BSharedRAG Framework.
  • Figure 3: Evaluating the influence of different retrievers to generation effectiveness. CPT is continual pre-training, which benefits retrieval effectiveness a lot via sharing an LLM backbone. Accuracy is judged by GPT-4. Other metrics are not shown due to limited space, but we observe similar trends.
  • Figure 4: A representative example to compare our BSharedRAG with a separate RAG. For the given question, our BSharedRAG Retriever favors the documents, in which some sentences are easy to be generated from the prompt of question. In contrast, the BERT-like BGE-large-zh model tends to retrieve some documents, in which some sentences match the question well. However, such document may be less suitable for generating answers due to some issues, e.g., important information missing or not easy to be used by generators.
  • Figure 5: Partial categories of WorthBuying dataset
  • ...and 1 more figures