Efficient Tool-Calling Multi-Expert NPC Agent for Commonsense Persona-Grounded Dialogue
Mahammad Nuriyev
TL;DR
This work tackles the dual challenge of producing NPCs capable of natural, contextually grounded dialogue and environment-interacting actions within strict latency on limited GPUs. It introduces a three-expert architecture built on a Qwen3 base, using LoRA adapters—ToolLoRA for tool-calling, NoLoRA for direct replies, and PersonaLoRA for integrating tool outputs into fluent responses—with an optimized inference pipeline and aggressive data augmentation. Key contributions include a detailed training and augmentation strategy, a robust inference workflow, and demonstrated competitive performance in the CPDC 2025 challenge along with concrete efficiency gains (e.g., average 3 s turn times and <30 GB VRAM). The results suggest practical benefits for deploying responsive, toolable NPCs in real-time interactive systems and outline concrete directions (knowledge graphs, constrained generation) to further improve reliability and efficiency.
Abstract
We present a multi-expert system for creating Non-Player Characters (NPCs) capable of both natural dialogue and contextual action execution in interactive environments. Using Qwen3 as the base model and Low-Rank Adaptation (LoRA) adapters, we instantiate three specialists: tool calling, tool-response interpretation, and direct dialogue. Our system comfortably meets the computational efficiency requirements, delivering fast responses and maintaining modest resource usage on L40S GPUs. In the Commonsense Persona-Grounded Dialogue Challenge 2025, our method ranked second overall. Code available at: https://github.com/MahammadNuriyev62/CPDC-challenge-2025-solution/
