Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy
Zihao Feng, Xiaoxue Wang, Bowen Wu, Weihong Zhong, Zhen Xu, Hailong Cao, Tiejun Zhao, Ying Li, Baoxun Wang
TL;DR
This paper addresses the challenge of building scalable, domain-general task-oriented dialogue systems when fine-tuned lightweight LLMs lag behind monolithic, large LLMs. It introduces the Domain-Independent Multi-Agent Framework (DIMF), which splits TOD into three domain-agnostic components—Intent Classification, Slot Filling, and Response—paired with a Data Distribution Adaptation (DDA) strategy to stabilize Direct Preference Optimisation (DPO) training. Across MultiWOZ 2.2, the DIMF with DPO-DDA achieves superior overall performance and strong zero-shot generalization, outperforming traditional models and prior LLM-based approaches. The work demonstrates that modularizing TOD tasks and carefully balancing training data for DPO can unlock robust performance with smaller LLMs, offering practical benefits for scalable, domain-spanning dialogue systems. The methodology combines structured prompts, inheritance of dialogue history, and reward-based fine-tuning to enhance reasoning and policy guidance in lightweight models, with potential impact on real-world TOD deployments and cross-domain adaptability. $Combine = \frac{Inform + Success}{2} + BLEU$ is used as a key performance metric alongside diversity and richness measures, underscoring a comprehensive evaluation of both accuracy and fluency.
Abstract
Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.
