Table of Contents
Fetching ...

Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction

Zhaopei Huang, Qifeng Dai, Guozheng Wu, Xiaopeng Wu, Kehan Chen, Chuan Yu, Xubin Li, Tiezheng Ge, Wenxuan Wang, Qin Jin

TL;DR

This paper tackles the challenge of long-term, memory-based personalization in service-oriented dialogue by introducing PAL-Bench, a Chinese benchmark, and PAL-Set, a richly synthesized dataset for evaluating user-specific requirements and preferences. It then proposes H2Memory, a hierarchical memory framework that separates concrete and abstract histories and uses retrieval-augmented generation to tailor responses. Empirical results on PAL-Bench and an external dataset demonstrate that H2Memory improves requirement understanding, solution alignment, and multi-turn dialogue quality, validating its effectiveness for personalized dialogue systems. The work advances practical personalized agents by combining structured memory, structured dialogue frameworks, and robust evaluation to model subjective user needs over extended interactions.

Abstract

With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants that can understand user-specific traits to accurately interpret requirements and tailor responses to individual preferences. However, existing approaches often overlook the complexities of long-term interactions and fail to capture users' subjective characteristics. To address these gaps, we present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions. In the absence of available real-world data, we develop a multi-step LLM-based synthesis pipeline, which is further verified and refined by human annotators. This process yields PAL-Set, the first Chinese dataset comprising multi-session user logs and dialogue histories, which serves as the foundation for PAL-Bench. Furthermore, to improve personalized service-oriented interactions, we propose H$^2$Memory, a hierarchical and heterogeneous memory framework that incorporates retrieval-augmented generation to improve personalized response generation. Comprehensive experiments on both our PAL-Bench and an external dataset demonstrate the effectiveness of the proposed memory framework.

Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction

TL;DR

This paper tackles the challenge of long-term, memory-based personalization in service-oriented dialogue by introducing PAL-Bench, a Chinese benchmark, and PAL-Set, a richly synthesized dataset for evaluating user-specific requirements and preferences. It then proposes H2Memory, a hierarchical memory framework that separates concrete and abstract histories and uses retrieval-augmented generation to tailor responses. Empirical results on PAL-Bench and an external dataset demonstrate that H2Memory improves requirement understanding, solution alignment, and multi-turn dialogue quality, validating its effectiveness for personalized dialogue systems. The work advances practical personalized agents by combining structured memory, structured dialogue frameworks, and robust evaluation to model subjective user needs over extended interactions.

Abstract

With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants that can understand user-specific traits to accurately interpret requirements and tailor responses to individual preferences. However, existing approaches often overlook the complexities of long-term interactions and fail to capture users' subjective characteristics. To address these gaps, we present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions. In the absence of available real-world data, we develop a multi-step LLM-based synthesis pipeline, which is further verified and refined by human annotators. This process yields PAL-Set, the first Chinese dataset comprising multi-session user logs and dialogue histories, which serves as the foundation for PAL-Bench. Furthermore, to improve personalized service-oriented interactions, we propose HMemory, a hierarchical and heterogeneous memory framework that incorporates retrieval-augmented generation to improve personalized response generation. Comprehensive experiments on both our PAL-Bench and an external dataset demonstrate the effectiveness of the proposed memory framework.

Paper Structure

This paper contains 56 sections, 1 equation, 4 figures, 11 tables.

Figures (4)

  • Figure 1: An example of our long-term, multi-session user-agent interaction data. The assistant is expected to leverage the historical interaction data (shown in lighter color) for memory modeling, enabling a more accurate understanding of user requirements and delivery of preference-aligned responses in the current dialogue.
  • Figure 2: Overview of the generative pipeline for PAL-Set. We design a multi-stage LLM-based synthesis process to progressively specify the control information for interaction record generation. Additional verification and refinement steps are employed to ensure the final data quality.
  • Figure 3: Overview of our method. We propose a hierarchical and heterogeneous memory mechanism (H$^2$Memory) to model user characteristics in user–agent interactions. Information from different sources is separately encoded into concrete- and abstract-level memory entries. The most relevant entries from each part are retrieved to enable personalized, retrieval-augmented response generation.
  • Figure 4: Case study of the "Multi-turn Dialogue Interaction" task. The text highlighted in yellow and blue demonstrates that our method enables the assistant to more concretely incorporate the user’s personalized background in inferring requirements and to provide solutions that align with the user’s positive preferences. In contrast, the text highlighted in gray indicates that most of the baselines in this case propose solutions that contradict the user's preferences, reflecting a lack of ability to correctly understand the user's characteristics.