MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking

Tianwen Tang; Tong Zhu; Haodong Liu; Yin Bai; Jia Cheng; Wenliang Chen

MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking

Tianwen Tang, Tong Zhu, Haodong Liu, Yin Bai, Jia Cheng, Wenliang Chen

TL;DR

This paper tackles zero-shot dialogue state tracking (DST) by addressing domain transfer and partial-prediction failures. It introduces Mixture of Prefix Experts (MoPE), which clusters slots into $K$ groups using $k$-means and assigns a dedicated deep prefix-prompt expert $\\Phi_k$ to each cluster, while keeping the backbone model fixed and training only the prompts. The approach demonstrates notable gains on MultiWOZ2.1 and SGD, achieving an average joint goal accuracy of $57.13\%$ and $55.40\%$, respectively, and shows that specialized prefixes, slot-feature representations, and an appropriate number of clusters are key to success. MoPE provides a parameter-efficient, plug-in solution that improves cross-domain slot transfer and reduces partial-predictions, with practical implications for scalable zero-shot DST in diverse dialog domains.

Abstract

Zero-shot dialogue state tracking (DST) transfers knowledge to unseen domains, reducing the cost of annotating new datasets. Previous zero-shot DST models mainly suffer from domain transferring and partial prediction problems. To address these challenges, we propose Mixture of Prefix Experts (MoPE) to establish connections between similar slots in different domains, which strengthens the model transfer performance in unseen domains. Empirical results demonstrate that MoPE-DST achieves the joint goal accuracy of 57.13% on MultiWOZ2.1 and 55.40% on SGD.

MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking

TL;DR

This paper tackles zero-shot dialogue state tracking (DST) by addressing domain transfer and partial-prediction failures. It introduces Mixture of Prefix Experts (MoPE), which clusters slots into

groups using

-means and assigns a dedicated deep prefix-prompt expert

to each cluster, while keeping the backbone model fixed and training only the prompts. The approach demonstrates notable gains on MultiWOZ2.1 and SGD, achieving an average joint goal accuracy of

and

, respectively, and shows that specialized prefixes, slot-feature representations, and an appropriate number of clusters are key to success. MoPE provides a parameter-efficient, plug-in solution that improves cross-domain slot transfer and reduces partial-predictions, with practical implications for scalable zero-shot DST in diverse dialog domains.

Abstract

Paper Structure (25 sections, 5 equations, 4 figures, 6 tables)

This paper contains 25 sections, 5 equations, 4 figures, 6 tables.

Introduction
Related Work
Dialogue State Tracking
Parameter Efficient Transfer Learning for DST
Preliminary
Methodology
Slot Clustering
Deep Prefix Prompt Tuning
Generation & Optimization
Experiments
Datasets
Baseline Models
Metrics
Settings
Main Results
...and 10 more sections

Figures (4)

Figure 1: Illustration of dialogues in different domains share similar slot names even the same slot value.
Figure 2: Illustration of our proposed method, including (a) Slot clustering, (b) Deep Prefix Prompt Tuning, and (c) Multiple Prefix Prompt Generation. Slot clustering is used to categorize all slots into distinct clusters and establishes connections between slots in different domains. Deep Prefix Prompt Tuning is our method to strengthen the LLM's conditional generation. Multiple Prefix Prompt Generation shows the complete pipeline of solving DST task.
Figure 3: Zero-shot results on the attraction domain with different representations of slot feature.
Figure 4: The slot error distribution of MoPE and DPPT.

MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking

TL;DR

Abstract

MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking

Authors

TL;DR

Abstract

Table of Contents

Figures (4)