Table of Contents
Fetching ...

Efficient Tool-Calling Multi-Expert NPC Agent for Commonsense Persona-Grounded Dialogue

Mahammad Nuriyev

TL;DR

This work tackles the dual challenge of producing NPCs capable of natural, contextually grounded dialogue and environment-interacting actions within strict latency on limited GPUs. It introduces a three-expert architecture built on a Qwen3 base, using LoRA adapters—ToolLoRA for tool-calling, NoLoRA for direct replies, and PersonaLoRA for integrating tool outputs into fluent responses—with an optimized inference pipeline and aggressive data augmentation. Key contributions include a detailed training and augmentation strategy, a robust inference workflow, and demonstrated competitive performance in the CPDC 2025 challenge along with concrete efficiency gains (e.g., average 3 s turn times and <30 GB VRAM). The results suggest practical benefits for deploying responsive, toolable NPCs in real-time interactive systems and outline concrete directions (knowledge graphs, constrained generation) to further improve reliability and efficiency.

Abstract

We present a multi-expert system for creating Non-Player Characters (NPCs) capable of both natural dialogue and contextual action execution in interactive environments. Using Qwen3 as the base model and Low-Rank Adaptation (LoRA) adapters, we instantiate three specialists: tool calling, tool-response interpretation, and direct dialogue. Our system comfortably meets the computational efficiency requirements, delivering fast responses and maintaining modest resource usage on L40S GPUs. In the Commonsense Persona-Grounded Dialogue Challenge 2025, our method ranked second overall. Code available at: https://github.com/MahammadNuriyev62/CPDC-challenge-2025-solution/

Efficient Tool-Calling Multi-Expert NPC Agent for Commonsense Persona-Grounded Dialogue

TL;DR

This work tackles the dual challenge of producing NPCs capable of natural, contextually grounded dialogue and environment-interacting actions within strict latency on limited GPUs. It introduces a three-expert architecture built on a Qwen3 base, using LoRA adapters—ToolLoRA for tool-calling, NoLoRA for direct replies, and PersonaLoRA for integrating tool outputs into fluent responses—with an optimized inference pipeline and aggressive data augmentation. Key contributions include a detailed training and augmentation strategy, a robust inference workflow, and demonstrated competitive performance in the CPDC 2025 challenge along with concrete efficiency gains (e.g., average 3 s turn times and <30 GB VRAM). The results suggest practical benefits for deploying responsive, toolable NPCs in real-time interactive systems and outline concrete directions (knowledge graphs, constrained generation) to further improve reliability and efficiency.

Abstract

We present a multi-expert system for creating Non-Player Characters (NPCs) capable of both natural dialogue and contextual action execution in interactive environments. Using Qwen3 as the base model and Low-Rank Adaptation (LoRA) adapters, we instantiate three specialists: tool calling, tool-response interpretation, and direct dialogue. Our system comfortably meets the computational efficiency requirements, delivering fast responses and maintaining modest resource usage on L40S GPUs. In the Commonsense Persona-Grounded Dialogue Challenge 2025, our method ranked second overall. Code available at: https://github.com/MahammadNuriyev62/CPDC-challenge-2025-solution/

Paper Structure

This paper contains 28 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Multi-Expert System Architecture showing the complete inference pipeline with decision phase, tool/no-tool paths, and the three expert models