Uncovering Customer Issues through Topological Natural Language Analysis
Shu-Ting Pi, Sidarth Srinivasan, Yuying Zhu, Michael Yang, Qun Liu
TL;DR
This work addresses the challenge of extracting emerging and trending customer issues from massive, unlabeled transcript data by integrating a sentence-level attention model with topological data analysis. The method first tags the primary customer question and produces sentence embeddings, which are whitened to an isotropic space before forming an undirected similarity graph across time windows. Centrality-based measures, including matched and mismatched decay centralities, identify topics that are growing or shifting over time, yielding trending and emerging scores. Validation against human annotations and external signals (forums and news) demonstrates that the approach captures meaningful business-relevant topics and is robust to hyperparameter choices, enabling rapid, data-driven operational insights.
Abstract
E-commerce companies deal with a high volume of customer service requests daily. While a simple annotation system is often used to summarize the topics of customer contacts, thoroughly exploring each specific issue can be challenging. This presents a critical concern, especially during an emerging outbreak where companies must quickly identify and address specific issues. To tackle this challenge, we propose a novel machine learning algorithm that leverages natural language techniques and topological data analysis to monitor emerging and trending customer issues. Our approach involves an end-to-end deep learning framework that simultaneously tags the primary question sentence of each customer's transcript and generates sentence embedding vectors. We then whiten the embedding vectors and use them to construct an undirected graph. From there, we define trending and emerging issues based on the topological properties of each transcript. We have validated our results through various methods and found that they are highly consistent with news sources.
