NLP-based detection of systematic anomalies among the narratives of consumer complaints
Peiheng Gao, Ning Sun, Xuefeng Wang, Chen Yang, Ričardas Zitikis
TL;DR
The paper tackles detecting systematic, nonmeritorious consumer complaints within narrative data by coupling an NLP-based classifier with downstream anomaly-detection on quantified narrative signals. It introduces two input-output systems based on TF-IDF and TF-IDF-VADER featurizations, and analyzes them with two indices, the I-index and B-index, to identify background risk without specifying distributions. A Cobb-Douglas-like relationship links sentiment, adjusted dollars, and word counts to produce transferable signals, enabling robust anomaly detection on the meritorious subset identified by classification. Empirically, SVM with TF-IDF achieves the strongest classification performance, while TF-IDF-VADER tends to reduce the presence of non-meritorious cases in the meritorious set, offering practical guidance for prioritizing reliefs in CFPB data and similar regulatory contexts.
Abstract
We develop an NLP-based procedure for detecting systematic nonmeritorious consumer complaints, simply called systematic anomalies, among complaint narratives. While classification algorithms are used to detect pronounced anomalies, in the case of smaller and frequent systematic anomalies, the algorithms may falter due to a variety of reasons, including technical ones as well as natural limitations of human analysts. Therefore, as the next step after classification, we convert the complaint narratives into quantitative data, which are then analyzed using an algorithm for detecting systematic anomalies. We illustrate the entire procedure using complaint narratives from the Consumer Complaint Database of the Consumer Financial Protection Bureau.
