Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

Gibbeum Lee; Volker Hartmann; Jongho Park; Dimitris Papailiopoulos; Kangwook Lee

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

Gibbeum Lee, Volker Hartmann, Jongho Park, Dimitris Papailiopoulos, Kangwook Lee

TL;DR

The paper addresses the high cost and rigidity of fine-tuning large language models for open-domain dialogue. It introduces MPC, a modular prompted chatbot that uses separate LLM-driven modules for utterance clarification, memory processing, generation, and dialogue summarization, supplemented by a Dense Passage Retriever memory store and CoT-inspired reasoning. Through extensive human evaluations, MPC with pre-trained LLMs matches or surpasses fine-tuned BlenderBot 3 in long-form conversations, demonstrating strong long-term consistency and engagement without fine-tuning. The work underscores the potential of modular prompting and memory augmentation to build flexible, domain-agnostic chatbots, while acknowledging limitations in efficiency and language scope.

Abstract

In this paper, we propose MPC (Modular Prompted Chatbot), a new approach for creating high-quality conversational agents without the need for fine-tuning. Our method utilizes pre-trained large language models (LLMs) as individual modules for long-term consistency and flexibility, by using techniques such as few-shot prompting, chain-of-thought (CoT), and external memory. Our human evaluation results show that MPC is on par with fine-tuned chatbot models in open-domain conversations, making it an effective solution for creating consistent and engaging chatbots.

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

TL;DR

Abstract

Paper Structure (38 sections, 6 figures, 5 tables)

This paper contains 38 sections, 6 figures, 5 tables.

Introduction
Our Contributions
Related Work
Modular Prompting
Open-domain Chatbots
Long-term Memory
Modular Prompted Chatbot
Utterance Clarifier
Memory Processor
Utterance Generator
Dialogue Summarizer
Experimental Setup
Single Model Evaluation
Pairwise Models Evaluation
Implicit Persona
...and 23 more sections

Figures (6)

Figure 1: Our modular design for improving long-term consistency in open-domain conversation.
Figure 2: The average score of MTurk workers group minus the average score of university students group. We find the two subgroups are very similar on average across metrics, though sensibleness seems to show the greatest difference. Students, in general, score chatbot models slightly more harshly. BB3-30B is an outlier which students score significantly lower than MTurk workers.
Figure 3: Evaluation form for a single model.
Figure 4: Evaluation form for pairwise model comparison.
Figure 5: We display this page before the evaluators start the evaluation process to inform them about the task and gather their consent for data usage.
...and 1 more figures

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

TL;DR

Abstract

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)