Table of Contents
Fetching ...

CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment

Radin Shayanfar, Chu Fei Luo, Rohan Bhambhoria, Samuel Dahan, Xiaodan Zhu

TL;DR

CoDial proposes a novel interpretable paradigm for task-oriented dialogue by translating a structured dialogue flow graph (CHIEF) into executable Colang guardrails via programmatic LLM guidance. The framework comprises CHIEF for rich task schemas, Guardrail-Grounded Code Generation (GCG) to produce guardrail code in two paradigms (CoDial_free and CoDial_structured), and CoDial Human Feedback (CHF) to iteratively refine the generated guardrails. Empirically, CoDial achieves state-of-the-art results on STAR in strict zero-shot settings and competitive performance on MultiWOZ, while preserving interpretability and enabling expert-guided, no-code alignment of LLM behavior. The work demonstrates practical utility for high-stakes TOD and highlights avenues for improved DST, code optimization, and scalable multi-domain extensions.

Abstract

Building Task-Oriented Dialogue (TOD) systems that generalize across different tasks remains a challenging problem. Data-driven approaches often struggle to transfer effectively to unseen tasks. While recent schema-based TOD frameworks improve generalization by decoupling task logic from language understanding, their reliance on neural or generative models often obscures how task schemas influence behaviour and hence impair interpretability. In this work, we introduce a novel framework, CoDial (Code for Dialogue), which converts a TOD task schema, represented as a novel structured heterogeneous graph, to programmatic LLM guardrailing code, such as NVIDIA's Colang, enabling interpretable and efficient alignment of dialogue policies during inference. We introduce two paradigms, $\text{CoDial}_{\text{free}}$ and $\text{CoDial}_{\text{structured}}$ for generating LLM guardrails, and propose a feedback mechanism that integrates human feedback to iteratively improve the generated code. Empirically, CoDial achieves state-of-the-art (SOTA) performance on the widely used STAR dataset and is on par with SOTA on the MultiWOZ dataset, while also providing interpretability. We additionally demonstrate CoDial's iterative improvement via manual and LLM-aided feedback, making it a practical tool for expert-guided alignment of LLMs in high-stakes domains.

CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment

TL;DR

CoDial proposes a novel interpretable paradigm for task-oriented dialogue by translating a structured dialogue flow graph (CHIEF) into executable Colang guardrails via programmatic LLM guidance. The framework comprises CHIEF for rich task schemas, Guardrail-Grounded Code Generation (GCG) to produce guardrail code in two paradigms (CoDial_free and CoDial_structured), and CoDial Human Feedback (CHF) to iteratively refine the generated guardrails. Empirically, CoDial achieves state-of-the-art results on STAR in strict zero-shot settings and competitive performance on MultiWOZ, while preserving interpretability and enabling expert-guided, no-code alignment of LLM behavior. The work demonstrates practical utility for high-stakes TOD and highlights avenues for improved DST, code optimization, and scalable multi-domain extensions.

Abstract

Building Task-Oriented Dialogue (TOD) systems that generalize across different tasks remains a challenging problem. Data-driven approaches often struggle to transfer effectively to unseen tasks. While recent schema-based TOD frameworks improve generalization by decoupling task logic from language understanding, their reliance on neural or generative models often obscures how task schemas influence behaviour and hence impair interpretability. In this work, we introduce a novel framework, CoDial (Code for Dialogue), which converts a TOD task schema, represented as a novel structured heterogeneous graph, to programmatic LLM guardrailing code, such as NVIDIA's Colang, enabling interpretable and efficient alignment of dialogue policies during inference. We introduce two paradigms, and for generating LLM guardrails, and propose a feedback mechanism that integrates human feedback to iteratively improve the generated code. Empirically, CoDial achieves state-of-the-art (SOTA) performance on the widely used STAR dataset and is on par with SOTA on the MultiWOZ dataset, while also providing interpretability. We additionally demonstrate CoDial's iterative improvement via manual and LLM-aided feedback, making it a practical tool for expert-guided alignment of LLMs in high-stakes domains.

Paper Structure

This paper contains 64 sections, 3 equations, 10 figures, 7 tables, 3 algorithms.

Figures (10)

  • Figure 1: Overview of the proposed CoDial framework. An expert-curated dialogue flow (left) is transformed into executable programmatic logic using an LLM (top). The generated code is iteratively refined before producing the final program, which powers a conversational application (right), enabling the chatbot to follow the designer's requirements.
  • Figure 2: Error rate comparison of agents' predicted state on the STAR dataset across different node types, coloured by ($\text{LLM}_{\text{GCG}}$, $\text{LLM}_{\text{A}}$) pairs.
  • Figure 3: An overview of $\text{prompt}_{\text{GCG}}(x)$, where a dialogue flow $x$ is wrapped with system prompt template.
  • Figure 4: Execution life cycle of the generated agent in $\text{CoDial}_{\text{structured}}$.
  • Figure 5: Example of the modified NeMo value_from_instruction action prompt, which is used for DST. $\color{blue} h_{2i-1}$ and $\color{red} p^{(s)}_j$ are provided in each prompt to generate a value for that slot.
  • ...and 5 more figures