Table of Contents
Fetching ...

Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware

Martin Braas, Lukas Esterle

TL;DR

This paper tackles the challenge of on-device, memory-rich NPC dialogue by combining LoRA-fine-tuned Small Language Models with runtime-swappable modular memory. It decouples fixed persona behavior from dynamic memory, enabling a single base model to drive multiple distinct NPC instances on consumer hardware. A retrieval-augmented pipeline integrates conversational memory and world knowledge stored in ChromaDB, supporting long-term coherence without retraining. Across three open-source SLMs, the study demonstrates feasible on-device deployment, quantifies the trade-offs between model size, latency, and memory usage, and highlights the practical benefits of memory modularity for scalable, expressive NPC interactions in games and other domains.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limitation arises from their substantial hardware requirements, latency constraints, and the necessity to maintain clearly defined knowledge boundaries within a game setting. In this paper, we propose a modular NPC dialogue system that leverages Small Language Models (SLMs), fine-tuned to encode specific NPC personas and integrated with runtime-swappable memory modules. These memory modules preserve character-specific conversational context and world knowledge, enabling expressive interactions and long-term memory without retraining or model reloading during gameplay. We comprehensively evaluate our system using three open-source SLMs: DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct, trained on synthetic persona-aligned data and benchmarked on consumer-grade hardware. While our approach is motivated by applications in gaming, its modular design and persona-driven memory architecture hold significant potential for broader adoption in domains requiring expressive, scalable, and memory-rich conversational agents, such as virtual assistants, customer support bots, or interactive educational systems.

Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware

TL;DR

This paper tackles the challenge of on-device, memory-rich NPC dialogue by combining LoRA-fine-tuned Small Language Models with runtime-swappable modular memory. It decouples fixed persona behavior from dynamic memory, enabling a single base model to drive multiple distinct NPC instances on consumer hardware. A retrieval-augmented pipeline integrates conversational memory and world knowledge stored in ChromaDB, supporting long-term coherence without retraining. Across three open-source SLMs, the study demonstrates feasible on-device deployment, quantifies the trade-offs between model size, latency, and memory usage, and highlights the practical benefits of memory modularity for scalable, expressive NPC interactions in games and other domains.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limitation arises from their substantial hardware requirements, latency constraints, and the necessity to maintain clearly defined knowledge boundaries within a game setting. In this paper, we propose a modular NPC dialogue system that leverages Small Language Models (SLMs), fine-tuned to encode specific NPC personas and integrated with runtime-swappable memory modules. These memory modules preserve character-specific conversational context and world knowledge, enabling expressive interactions and long-term memory without retraining or model reloading during gameplay. We comprehensively evaluate our system using three open-source SLMs: DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct, trained on synthetic persona-aligned data and benchmarked on consumer-grade hardware. While our approach is motivated by applications in gaming, its modular design and persona-driven memory architecture hold significant potential for broader adoption in domains requiring expressive, scalable, and memory-rich conversational agents, such as virtual assistants, customer support bots, or interactive educational systems.

Paper Structure

This paper contains 29 sections, 12 figures, 1 table.

Figures (12)

  • Figure 1: Diagram depicting the Runtime Dialogue Pipeline
  • Figure 2: Factual accuracy of NPC responses across different model variants.
  • Figure 3: Context retention performance of each NPC variant, measured by the percentage of correct keyword recalls in multi-turn conversations.
  • Figure 4: Accuracy of world knowledge retrieval by NPC model, illustrating the percentage of correctly retrieved entries from memory databases.
  • Figure 5: Average grammatical and stylistic errors per response produced by NPC models, as measured by LanguageTool. Lower values indicate better fluency.
  • ...and 7 more figures