Table of Contents
Fetching ...

LLM-Friendly Knowledge Representation for Customer Support

Hanchen Su, Wei Luo, Wei Han, Yu Elaine Liu, Yufeng Wayne Zhang, Cen Mia Zhao, Ying Joy Zhang, Yashar Mehdad

TL;DR

The paper tackles the challenge of deploying large language models for enterprise customer support by reformulating complex internal knowledge into an LLM-friendly ICA format (Intent, Context, Action) and generating synthetic data to fine-tune models. The authors propose ICA pseudocode with action IDs and an action map to streamline online action prediction, and a three-step synthetic data pipeline to enable supervised fine-tuning without heavy human annotation. They demonstrate that ICA, combined with Chain-of-Thought reasoning and synthetic data, improves accuracy and reduces manual processing time, especially for smaller open-source LLMs, while maintaining manageable latency. The work offers a scalable, cost-effective framework for knowledge representation and LLM fine-tuning that can generalize to other domains beyond Airbnb, such as legal or finance.

Abstract

We propose a practical approach by integrating Large Language Models (LLMs) with a framework designed to navigate the complexities of Airbnb customer support operations. In this paper, our methodology employs a novel reformatting technique, the Intent, Context, and Action (ICA) format, which transforms policies and workflows into a structure more comprehensible to LLMs. Additionally, we develop a synthetic data generation strategy to create training data with minimal human intervention, enabling cost-effective fine-tuning of our model. Our internal experiments (not applied to Airbnb products) demonstrate that our approach of restructuring workflows and fine-tuning LLMs with synthetic data significantly enhances their performance, setting a new benchmark for their application in customer support. Our solution is not only cost-effective but also improves customer support, as evidenced by both accuracy and manual processing time evaluation metrics.

LLM-Friendly Knowledge Representation for Customer Support

TL;DR

The paper tackles the challenge of deploying large language models for enterprise customer support by reformulating complex internal knowledge into an LLM-friendly ICA format (Intent, Context, Action) and generating synthetic data to fine-tune models. The authors propose ICA pseudocode with action IDs and an action map to streamline online action prediction, and a three-step synthetic data pipeline to enable supervised fine-tuning without heavy human annotation. They demonstrate that ICA, combined with Chain-of-Thought reasoning and synthetic data, improves accuracy and reduces manual processing time, especially for smaller open-source LLMs, while maintaining manageable latency. The work offers a scalable, cost-effective framework for knowledge representation and LLM fine-tuning that can generalize to other domains beyond Airbnb, such as legal or finance.

Abstract

We propose a practical approach by integrating Large Language Models (LLMs) with a framework designed to navigate the complexities of Airbnb customer support operations. In this paper, our methodology employs a novel reformatting technique, the Intent, Context, and Action (ICA) format, which transforms policies and workflows into a structure more comprehensible to LLMs. Additionally, we develop a synthetic data generation strategy to create training data with minimal human intervention, enabling cost-effective fine-tuning of our model. Our internal experiments (not applied to Airbnb products) demonstrate that our approach of restructuring workflows and fine-tuning LLMs with synthetic data significantly enhances their performance, setting a new benchmark for their application in customer support. Our solution is not only cost-effective but also improves customer support, as evidenced by both accuracy and manual processing time evaluation metrics.

Paper Structure

This paper contains 15 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Intelligent customer support: generate the correct response based on internal workflows and context data
  • Figure 2: Converting workflows in one document from original (rich text) format to the ICA format.
  • Figure 3: Our solution includes: 1) Transforming the workflow into ICA format, thereby enhancing the interpretive abilities of language models. 2) Online prediction: Retrieving relevant ICA candidates by comparing the user query and "Intent" part of the ICAs in the knowledge base; Retrieving necessary contextual data from backbend APIs; Utilizing LLMs to generate the action to take 3) Offline training: Addressing the scarcity of training data by employing synthetic methods to create the necessary data. We then apply Supervised Fine-Tuning (SFT) to train the open-source language models.
  • Figure 4: Three steps of generating synthetic training data: 1) Sample user query and context data randomly to establish a matched branch. 2) Incorporating additional divergent branches to construct the decision trees. 3) Developing pseudocode, detailing the reasoning process, and deriving the label from the trees, then integrating these components to assemble the training dataset.