LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
Varsha Embar, Ritvik Shrivastava, Vinay Damodaran, Travis Mehlinger, Yu-Chung Hsiao, Karthik Raghunathan
TL;DR
This paper addresses the challenge of extracting actionable insights from contact-center transcripts by focusing on concise call drivers instead of broad topics. It presents a production-ready LLM-based system that generates call drivers and leverages them for topic modeling, trend detection, and FAQ generation, while employing cost-saving strategies such as Multi-LoRA and input compression. The authors evaluate proprietary, open-weight, and fine-tuned models, and demonstrate that a 4-bit LoRA-tuned Mistral-7B-Instruct-v0.2 closely matches human annotations and delivers substantial cost savings, with topic modeling using HDBSCAN producing coherent clusters. The work showcases a practical, privacy-conscious deployment on Kubernetes (EKS) with spot instances, achieving scalable, low-latency CC analytics and actionable insights for agents and administrators.
Abstract
Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering actionable insights for contact center agents and administrators to consume. We present a cost-efficient LLM system design, with 1) a comprehensive evaluation of proprietary, open-weight, and fine-tuned models and 2) cost-efficient strategies, and 3) the corresponding cost analysis when deployed in production environments.
