Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware
Martin Braas, Lukas Esterle
TL;DR
This paper tackles the challenge of on-device, memory-rich NPC dialogue by combining LoRA-fine-tuned Small Language Models with runtime-swappable modular memory. It decouples fixed persona behavior from dynamic memory, enabling a single base model to drive multiple distinct NPC instances on consumer hardware. A retrieval-augmented pipeline integrates conversational memory and world knowledge stored in ChromaDB, supporting long-term coherence without retraining. Across three open-source SLMs, the study demonstrates feasible on-device deployment, quantifies the trade-offs between model size, latency, and memory usage, and highlights the practical benefits of memory modularity for scalable, expressive NPC interactions in games and other domains.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limitation arises from their substantial hardware requirements, latency constraints, and the necessity to maintain clearly defined knowledge boundaries within a game setting. In this paper, we propose a modular NPC dialogue system that leverages Small Language Models (SLMs), fine-tuned to encode specific NPC personas and integrated with runtime-swappable memory modules. These memory modules preserve character-specific conversational context and world knowledge, enabling expressive interactions and long-term memory without retraining or model reloading during gameplay. We comprehensively evaluate our system using three open-source SLMs: DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct, trained on synthetic persona-aligned data and benchmarked on consumer-grade hardware. While our approach is motivated by applications in gaming, its modular design and persona-driven memory architecture hold significant potential for broader adoption in domains requiring expressive, scalable, and memory-rich conversational agents, such as virtual assistants, customer support bots, or interactive educational systems.
