Table of Contents
Fetching ...

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning

Xintong Li, Jalend Bantupalli, Ria Dharmani, Yuwei Zhang, Jingbo Shang

TL;DR

This paper tackles the problem of implicit reasoning in long-term personalized dialogue by introducing ImplexConv, a large-scale multi-session dataset with ~2,500 examples and ~100 sessions per example, and a hierarchical retrieval framework called TaciTree. ImplexConv embeds subtle opposed and supportive reasoning cues within persona-driven conversations to challenge retrieval-based and long-context models, while TaciTree organizes history into multi-level summaries to enable level-based, efficient retrieval of implicit knowledge. Empirical results show that TaciTree significantly improves retrieval accuracy and reduces token usage compared to MemoryBank and RAG baselines, particularly on the challenging ImplexConv dataset, highlighting the value of structured, hierarchical retrieval for long-range dependencies. The work advances practical capabilities for coherent, personalized AI assistants across many sessions and points to future avenues in adaptive retrieval and stronger reasoning models to handle nuanced implicit information.

Abstract

There has been a surge in the use of large language models (LLM) conversational agents to generate responses based on long-term history from multiple sessions. However, existing long-term open-domain dialogue datasets lack complex, real-world personalization and fail to capture implicit reasoning-where relevant information is embedded in subtle, syntactic, or semantically distant connections rather than explicit statements. In such cases, traditional retrieval methods fail to capture relevant context, and long-context modeling also becomes inefficient due to numerous complicated persona-related details. To address this gap, we introduce ImplexConv, a large-scale long-term dataset with 2,500 examples, each containing approximately 100 conversation sessions, designed to study implicit reasoning in personalized dialogues. Additionally, we propose TaciTree, a novel hierarchical tree framework that structures conversation history into multiple levels of summarization. Instead of brute-force searching all data, TaciTree enables an efficient, level-based retrieval process where models refine their search by progressively selecting relevant details. Our experiments demonstrate that TaciTree significantly improves the ability of LLMs to reason over long-term conversations with implicit contextual dependencies.

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning

TL;DR

This paper tackles the problem of implicit reasoning in long-term personalized dialogue by introducing ImplexConv, a large-scale multi-session dataset with ~2,500 examples and ~100 sessions per example, and a hierarchical retrieval framework called TaciTree. ImplexConv embeds subtle opposed and supportive reasoning cues within persona-driven conversations to challenge retrieval-based and long-context models, while TaciTree organizes history into multi-level summaries to enable level-based, efficient retrieval of implicit knowledge. Empirical results show that TaciTree significantly improves retrieval accuracy and reduces token usage compared to MemoryBank and RAG baselines, particularly on the challenging ImplexConv dataset, highlighting the value of structured, hierarchical retrieval for long-range dependencies. The work advances practical capabilities for coherent, personalized AI assistants across many sessions and points to future avenues in adaptive retrieval and stronger reasoning models to handle nuanced implicit information.

Abstract

There has been a surge in the use of large language models (LLM) conversational agents to generate responses based on long-term history from multiple sessions. However, existing long-term open-domain dialogue datasets lack complex, real-world personalization and fail to capture implicit reasoning-where relevant information is embedded in subtle, syntactic, or semantically distant connections rather than explicit statements. In such cases, traditional retrieval methods fail to capture relevant context, and long-context modeling also becomes inefficient due to numerous complicated persona-related details. To address this gap, we introduce ImplexConv, a large-scale long-term dataset with 2,500 examples, each containing approximately 100 conversation sessions, designed to study implicit reasoning in personalized dialogues. Additionally, we propose TaciTree, a novel hierarchical tree framework that structures conversation history into multiple levels of summarization. Instead of brute-force searching all data, TaciTree enables an efficient, level-based retrieval process where models refine their search by progressively selecting relevant details. Our experiments demonstrate that TaciTree significantly improves the ability of LLMs to reason over long-term conversations with implicit contextual dependencies.

Paper Structure

This paper contains 33 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An example from ImplexConv illustrating opposed (left) and supportive (right) implicit reasoning. The orange block is the user query, the red blocks are implicit scenarios with low semantic similarity to the query, and the blue blocks are noisy but lexically related conversations that obscure the correct response.
  • Figure 2: Overview of TaciTree framework. TaciTree organizes long-term conversational history into a hierarchical structure, clustering related facts to enable efficient retrieval of implicit reasoning. By leveraging LLMs to refine relevant information while discarding unrelated details, the framework reduces search space and improves retrieval efficiency.
  • Figure 3: Distribution of implicitness scores across datasets, where Supp. and Opp. represent the supportive and opposed cases of ImplexConv, respectively.
  • Figure 4: Response accuracy (blue) and retrieved token size (orange) across different frameworks and datasets.