Mixed-Session Conversation with Egocentric Memory
Jihyoung Jang, Taeyoung Kim, Hyounghun Kim
TL;DR
This work introduces Mixed-Session Conversation (MiSC), a dataset and benchmark designed to capture long-horizon dialogues with changing partners across sessions, alongside EMMA, a memory-enabled agent that uses Egocentric Memory to preserve continuity. MiSC comprises 8.5K episodes each with six sessions and four speakers, and is built with topic seeds, scenario generation, and memory-tagged dialogue to ensure coherence across sessions. EMMA combines a dialogue module (FLAN-T5-Large fine-tuned with QLoRA) and a retrieval module (CPM-based with BERT encoders) to generate and retrieve memories from each partner’s perspective, enabling consistent cross-session interactions. Human evaluations show MiSC’s dialogue quality (consistency and coherence) and memory quality (summarization, linking, tagging) are high, while EMMA demonstrates strong humanness, engagingness, and memorability, especially when handling partner changes across sessions. Overall, the MiSC-EMMA framework advances open-domain dialogue by enabling both long-term, multi-party continuity and scalable memory management with strong practical implications for realistic multi-speaker conversations.
Abstract
Recently introduced dialogue systems have demonstrated high usability. However, they still fall short of reflecting real-world conversation scenarios. Current dialogue systems exhibit an inability to replicate the dynamic, continuous, long-term interactions involving multiple partners. This shortfall arises because there have been limited efforts to account for both aspects of real-world dialogues: deeply layered interactions over the long-term dialogue and widely expanded conversation networks involving multiple participants. As the effort to incorporate these aspects combined, we introduce Mixed-Session Conversation, a dialogue system designed to construct conversations with various partners in a multi-session dialogue setup. We propose a new dataset called MiSC to implement this system. The dialogue episodes of MiSC consist of 6 consecutive sessions, with four speakers (one main speaker and three partners) appearing in each episode. Also, we propose a new dialogue model with a novel memory management mechanism, called Egocentric Memory Enhanced Mixed-Session Conversation Agent (EMMA). EMMA collects and retains memories from the main speaker's perspective during conversations with partners, enabling seamless continuity in subsequent interactions. Extensive human evaluations validate that the dialogues in MiSC demonstrate a seamless conversational flow, even when conversation partners change in each session. EMMA trained with MiSC is also evaluated to maintain high memorability without contradiction throughout the entire conversation.
