Table of Contents
Fetching ...

UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems

Hongru Wang, Wenyu Huang, Yang Deng, Rui Wang, Zezhong Wang, Yufei Wang, Fei Mi, Jeff Z. Pan, Kam-Fai Wong

TL;DR

This work tackles personalized, knowledge-grounded dialogue in a setting with multiple information sources. It introduces UniMS-RAG, a unified Seq2Seq framework that harnesses acting tokens to plan source usage, evaluation tokens to assess evidence relevance, and a self-refinement loop to iteratively improve responses. The approach jointly trains planning, retrieval, and generation, demonstrating state-of-the-art performance on DuLeMon and KBP across knowledge-source selection and response generation, with strong analyses and human evaluations. The framework highlights the value of unified multi-source retrieval-augmented generation and provides practical guidance on token-based control, relevance scoring, and inference-time refinement for personalized dialogue systems.

Abstract

Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks. However, the personalization issue still remains a much-coveted property, especially when it comes to the multiple sources involved in the dialogue system. To better plan and incorporate the use of multiple sources in generating personalized response, we firstly decompose it into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation. We then propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG) Specifically, we unify these three sub-tasks with different formulations into the same sequence-to-sequence paradigm during the training, to adaptively retrieve evidences and evaluate the relevance on-demand using special tokens, called acting tokens and evaluation tokens. Enabling language models to generate acting tokens facilitates interaction with various knowledge sources, allowing them to adapt their behavior to diverse task requirements. Meanwhile, evaluation tokens gauge the relevance score between the dialogue context and the retrieved evidence. In addition, we carefully design a self-refinement mechanism to iteratively refine the generated response considering 1) the consistency scores between the generated response and retrieved evidence; and 2) the relevance scores. Experiments on two personalized datasets (DuLeMon and KBP) show that UniMS-RAG achieves state-of-the-art performance on the knowledge source selection and response generation task with itself as a retriever in a unified manner. Extensive analyses and discussions are provided for shedding some new perspectives for personalized dialogue systems.

UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems

TL;DR

This work tackles personalized, knowledge-grounded dialogue in a setting with multiple information sources. It introduces UniMS-RAG, a unified Seq2Seq framework that harnesses acting tokens to plan source usage, evaluation tokens to assess evidence relevance, and a self-refinement loop to iteratively improve responses. The approach jointly trains planning, retrieval, and generation, demonstrating state-of-the-art performance on DuLeMon and KBP across knowledge-source selection and response generation, with strong analyses and human evaluations. The framework highlights the value of unified multi-source retrieval-augmented generation and provides practical guidance on token-based control, relevance scoring, and inference-time refinement for personalized dialogue systems.

Abstract

Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks. However, the personalization issue still remains a much-coveted property, especially when it comes to the multiple sources involved in the dialogue system. To better plan and incorporate the use of multiple sources in generating personalized response, we firstly decompose it into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation. We then propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG) Specifically, we unify these three sub-tasks with different formulations into the same sequence-to-sequence paradigm during the training, to adaptively retrieve evidences and evaluate the relevance on-demand using special tokens, called acting tokens and evaluation tokens. Enabling language models to generate acting tokens facilitates interaction with various knowledge sources, allowing them to adapt their behavior to diverse task requirements. Meanwhile, evaluation tokens gauge the relevance score between the dialogue context and the retrieved evidence. In addition, we carefully design a self-refinement mechanism to iteratively refine the generated response considering 1) the consistency scores between the generated response and retrieved evidence; and 2) the relevance scores. Experiments on two personalized datasets (DuLeMon and KBP) show that UniMS-RAG achieves state-of-the-art performance on the knowledge source selection and response generation task with itself as a retriever in a unified manner. Extensive analyses and discussions are provided for shedding some new perspectives for personalized dialogue systems.
Paper Structure (38 sections, 7 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 38 sections, 7 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Two typical examples of multi-source personalized knowledge-grounded dialogues: upper): An example from DuLeMon dulemon; and bottom): An example from KBP wang2023large. We use same color to indicate the response and corresponding grounded knowledge. We skip the dialogue context for simplicity.
  • Figure 2: Our proposed method UniMS-RAG, where three optimization tasks are carefully designed: 1) Knowledge Source Selection; 2) Relevance Score Prediction; and 3) Response Generation. We use orange to indicate acting tokens and blue to indicate evaluation tokens. It is worth noting we have all labels during training to optimize these three sub-tasks in a teacher-forcing way.
  • Figure 3: The general framework to utilize UniMS-RAG, including 1) relevance score acquisition; 2) training stage; and 3) inference stage.
  • Figure 4: The instructions for zero-shot retriever used to predict similarity score using off-the-shelf LLMs. The grey and yellow blocks indicate the inputs and outputs of the model.
  • Figure 5: The performance of Generation with different number of retrieved evidences on two datasets.