Table of Contents
Fetching ...

OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking

Chia-Hsuan Lee, Hao Cheng, Mari Ostendorf

TL;DR

This work introduces OrchestraLLM, a retrieval-based routing framework that dynamically dispatches dialogue state tracking tasks between a small, fine-tuned SLM (Prompt-DST) and a large, generic LLM (IC-DST). By constructing expert pools from held-out data and learning triplet representations with contrastive objectives, the router selects the most reliable LM for each turn via majority voting, avoiding router training. Across MultiWOZ 2.4 and SGD, OrchestraLLM achieves substantial computation savings (≈50%+ FLOPs) while surpassing LLM-only baselines in DST accuracy, including robust cross-domain and cross-dataset generalization and the ability to incorporate a new LM without retraining the retriever. The results demonstrate that leveraging the complementary strengths of SLMs and LLMs enables efficient, scalable DST in real-world, few-shot settings with strong practical impact for dialogue systems.

Abstract

Large language models (LLMs) have revolutionized the landscape of Natural Language Processing systems, but are computationally expensive. To reduce the cost without sacrificing performance, previous studies have explored various approaches to harness the potential of Small Language Models (SLMs) as cost-effective alternatives to their larger counterparts. Driven by findings that SLMs and LLMs exhibit complementary strengths in a structured knowledge extraction task, this work presents a novel SLM/LLM routing framework designed to improve computational efficiency and enhance task performance. First, exemplar pools are created to represent the types of contexts where each LM provides a more reliable answer, leveraging a sentence embedding fine-tuned so that context similarity is close to dialogue state similarity. Then, during inference, the k-nearest exemplars to the testing instance are retrieved, and the instance is routed according to majority vote. In dialogue state tracking tasks, the proposed routing framework enhances performance substantially compared to relying solely on LLMs, while reducing the computational costs by over 50%.

OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking

TL;DR

This work introduces OrchestraLLM, a retrieval-based routing framework that dynamically dispatches dialogue state tracking tasks between a small, fine-tuned SLM (Prompt-DST) and a large, generic LLM (IC-DST). By constructing expert pools from held-out data and learning triplet representations with contrastive objectives, the router selects the most reliable LM for each turn via majority voting, avoiding router training. Across MultiWOZ 2.4 and SGD, OrchestraLLM achieves substantial computation savings (≈50%+ FLOPs) while surpassing LLM-only baselines in DST accuracy, including robust cross-domain and cross-dataset generalization and the ability to incorporate a new LM without retraining the retriever. The results demonstrate that leveraging the complementary strengths of SLMs and LLMs enables efficient, scalable DST in real-world, few-shot settings with strong practical impact for dialogue systems.

Abstract

Large language models (LLMs) have revolutionized the landscape of Natural Language Processing systems, but are computationally expensive. To reduce the cost without sacrificing performance, previous studies have explored various approaches to harness the potential of Small Language Models (SLMs) as cost-effective alternatives to their larger counterparts. Driven by findings that SLMs and LLMs exhibit complementary strengths in a structured knowledge extraction task, this work presents a novel SLM/LLM routing framework designed to improve computational efficiency and enhance task performance. First, exemplar pools are created to represent the types of contexts where each LM provides a more reliable answer, leveraging a sentence embedding fine-tuned so that context similarity is close to dialogue state similarity. Then, during inference, the k-nearest exemplars to the testing instance are retrieved, and the instance is routed according to majority vote. In dialogue state tracking tasks, the proposed routing framework enhances performance substantially compared to relying solely on LLMs, while reducing the computational costs by over 50%.
Paper Structure (31 sections, 8 equations, 2 figures, 6 tables)

This paper contains 31 sections, 8 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Illustration of OrchestraLLM. LMs are orchestrated by a retrieval-based dynamic router. During inference, the testing instance queries the expert pools to retrieve top k similar examples. Subsequently, a LM expert is selected based on the majority vote.
  • Figure 2: Cross-domain generalization results on SGD. We denote In-Domain when all of the testing domains are in the training set and denote OOD when all of the testing domains are not in the training set. For all other dialogues, we categorize them as Half OOD. We report TLB JGA for all settings. Green bars indicate OrchestraLLM with different retrievers.