CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

Jingzhe Shi; Jialuo Li; Qinwei Ma; Zaiwen Yang; Huan Ma; Lei Li

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, Lei Li

TL;DR

This work tackles safe, cost-conscious customer service with LLMs by integrating user profiles and APIs through a classifier-executor-verifier CHOPS framework. It introduces the CPHOS-dataset, consisting of a database, PDF-based guides, and QA pairs derived from real-world CPHS interactions, to enable evaluation of LLM-based customer-service workflows. Experiments show that CHOPS achieves high accuracy (up to $>98\%$ on key metrics) with favorable cost compared to end-to-end LLM baselines, especially when using a 2-level classifier and a verifier; mixing LLM backbones (weaker classifiers/verifiers with a stronger executor) yields practical performance-cost trade-offs. The approach provides a scalable path to deploying LLM-driven customer service within existing systems, with robust safeguards and resource efficiency, and points to broader applicability beyond the Olympiad domain through expanded datasets and tools.

Abstract

Businesses and software platforms are increasingly turning to Large Language Models (LLMs) such as GPT-3.5, GPT-4, GLM-3, and LLaMa-2 for chat assistance with file access or as reasoning agents for customer service. However, current LLM-based customer service models have limited integration with customer profiles and lack the operational capabilities necessary for effective service. Moreover, existing API integrations emphasize diversity over the precision and error avoidance essential in real-world customer service scenarios. To address these issues, we propose an LLM agent named CHOPS (CHat with custOmer Profile in existing System), designed to: (1) efficiently utilize existing databases or systems for accessing user information or interacting with these systems following existing guidelines; (2) provide accurate and reasonable responses or carry out required operations in the system while avoiding harmful operations; and (3) leverage a combination of small and large LLMs to achieve satisfying performance at a reasonable inference cost. We introduce a practical dataset, the CPHOS-dataset, which includes a database, guiding files, and QA pairs collected from CPHOS, an online platform that facilitates the organization of simulated Physics Olympiads for high school teachers and students. We have conducted extensive experiments to validate the performance of our proposed CHOPS architecture using the CPHOS-dataset, with the aim of demonstrating how LLMs can enhance or serve as alternatives to human customer service. Code for our proposed architecture and dataset can be found at {https://github.com/JingzheShi/CHOPS}.

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

TL;DR

on key metrics) with favorable cost compared to end-to-end LLM baselines, especially when using a 2-level classifier and a verifier; mixing LLM backbones (weaker classifiers/verifiers with a stronger executor) yields practical performance-cost trade-offs. The approach provides a scalable path to deploying LLM-driven customer service within existing systems, with robust safeguards and resource efficiency, and points to broader applicability beyond the Olympiad domain through expanded datasets and tools.

Abstract

Paper Structure (28 sections, 2 equations, 5 figures, 8 tables)

This paper contains 28 sections, 2 equations, 5 figures, 8 tables.

Introduction
Related Works
Retrieval-Augmented Generation with LLMs.
LLM Agents.
LLMs tools.
CPHOS-dataset: A real-scene dataset for customer service
Database
PDF-based guides
Methods
Framework Overview
Input Classifier
Executor
Verifier
Tools used
Experiments
...and 13 more sections

Figures (5)

Figure 1: Left: Existing scenarios for Customer Service require File QA and System Manipulation. Middle: Possible mistakes in Customer Service. Accuracy is needed in this scenario, especially to avoid those harmful operations. Right: existing methods to use LLM as assistants. LLMs for APIs like ToolLLM apillm2 mainly focus on a large number of APIs in API hubs.
Figure 2: Dataset Examples include guide file-related QAs on the left; in the middle and right, there are system-related QAs and instructions. For the same query, results may differ based on the Query User Status (middle). Similarly, for the same API, the outcome of calling it may vary.
Figure 3: Our CHOPS architecture including Classifer, Executor and Verifier.
Figure 4: Classifier Architecture. Left: a binary 1-level Classifier. Right: a 2-level Classifier
Figure 5: Effectiveness and Efficiency of 2-level Classifier, Executor and Verifier in our proposed CHOPS-architecture. Blue dots and lines: average accuracy for $\text{Acc}_{\text{sys}}$ and $\text{Acc}_{\text{file}}$. Green bar chart: relative cost estimated compared to Executor only with gpt-4-0125-preview backbone. Baselines: gpt-4-0125-preview and gpt-3.5-turbo.

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

TL;DR

Abstract

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)