Table of Contents
Fetching ...

Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy

Zihao Feng, Xiaoxue Wang, Bowen Wu, Weihong Zhong, Zhen Xu, Hailong Cao, Tiejun Zhao, Ying Li, Baoxun Wang

TL;DR

This paper addresses the challenge of building scalable, domain-general task-oriented dialogue systems when fine-tuned lightweight LLMs lag behind monolithic, large LLMs. It introduces the Domain-Independent Multi-Agent Framework (DIMF), which splits TOD into three domain-agnostic components—Intent Classification, Slot Filling, and Response—paired with a Data Distribution Adaptation (DDA) strategy to stabilize Direct Preference Optimisation (DPO) training. Across MultiWOZ 2.2, the DIMF with DPO-DDA achieves superior overall performance and strong zero-shot generalization, outperforming traditional models and prior LLM-based approaches. The work demonstrates that modularizing TOD tasks and carefully balancing training data for DPO can unlock robust performance with smaller LLMs, offering practical benefits for scalable, domain-spanning dialogue systems. The methodology combines structured prompts, inheritance of dialogue history, and reward-based fine-tuning to enhance reasoning and policy guidance in lightweight models, with potential impact on real-world TOD deployments and cross-domain adaptability. $Combine = \frac{Inform + Success}{2} + BLEU$ is used as a key performance metric alongside diversity and richness measures, underscoring a comprehensive evaluation of both accuracy and fluency.

Abstract

Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.

Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy

TL;DR

This paper addresses the challenge of building scalable, domain-general task-oriented dialogue systems when fine-tuned lightweight LLMs lag behind monolithic, large LLMs. It introduces the Domain-Independent Multi-Agent Framework (DIMF), which splits TOD into three domain-agnostic components—Intent Classification, Slot Filling, and Response—paired with a Data Distribution Adaptation (DDA) strategy to stabilize Direct Preference Optimisation (DPO) training. Across MultiWOZ 2.2, the DIMF with DPO-DDA achieves superior overall performance and strong zero-shot generalization, outperforming traditional models and prior LLM-based approaches. The work demonstrates that modularizing TOD tasks and carefully balancing training data for DPO can unlock robust performance with smaller LLMs, offering practical benefits for scalable, domain-spanning dialogue systems. The methodology combines structured prompts, inheritance of dialogue history, and reward-based fine-tuning to enhance reasoning and policy guidance in lightweight models, with potential impact on real-world TOD deployments and cross-domain adaptability. is used as a key performance metric alongside diversity and richness measures, underscoring a comprehensive evaluation of both accuracy and fluency.

Abstract

Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.

Paper Structure

This paper contains 28 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Different architectures of our proposed system and other LLM-based systems. The left part is other LLM-based systems and the right is ours. The information in the orange box indicates the strategies in different sub-tasks that the agent needs to follow.
  • Figure 2: The main framework of our proposed method. The left part is the framework of our proposed DIMF. We train three agents to collaboratively solve users' questions and provide responses. Each agent can fulfill different user needs through different prompts, instead of training domain-specific agents (as indicated by the agents in the left part such as "Restaurant"). The right part is the framework of our training process for each agent. We first fine-tune the model with the training set, and then leverage the validation dataset to complete the DPO process.
  • Figure 3: The rewards of the chosen data and rejected data during the Slot Filling Agent DPO training. The left figure is the original DPO method and the right one is our proposed DDA method. The red line represents the reward of 0.
  • Figure 4: The Results of the DIMF after removing training data from a specific domain. The first sub-figure shows the results of the system after removing different domains. The other sub-figures shows the performance of each domain after removing a specific domain respectively.
  • Figure 5: An example of one round of the conversation between user and our DIMF. This case contains the history of the conversation, the question of the user and the generation process of DIMF trained with different methods. The red word represents incorrect information and responses, and green represents correct ones.
  • ...and 2 more figures