Demystifying the Accuracy-Interpretability Trade-Off: A Case Study of Inferring Ratings from Reviews
Pranjal Atrey, Michael P. Brundage, Min Wu, Sanghamitra Dutta
TL;DR
This paper investigates the accuracy-interpretability trade-off by inferring product ratings from textual reviews, introducing Composite Interpretability (CI) to rank both interpretable and composite models. It systematically compares VADER, various embeddings (CountVectorizer, TF-IDF, Word2Vec), classifiers (LR, NB, SVM, NN), and a BERT sentiment model, including composite configurations that fuse BERT sentiment with traditional features. The findings show a general tendency for accuracy to rise as interpretability declines, but the relationship is not strictly monotonic and interpretable models can outperform some black-box setups; NB with Word2Vec, in particular, performs poorly due to feature independence assumptions. The CI framework, supplemented by expert judgments, provides nuanced guidance for model selection in NLP tasks where transparency matters, with future work aimed at analytical interpretability metrics and explainable-AI tools like LIME for deeper insights.
Abstract
Interpretable machine learning models offer understandable reasoning behind their decision-making process, though they may not always match the performance of their black-box counterparts. This trade-off between interpretability and model performance has sparked discussions around the deployment of AI, particularly in critical applications where knowing the rationale of decision-making is essential for trust and accountability. In this study, we conduct a comparative analysis of several black-box and interpretable models, focusing on a specific NLP use case that has received limited attention: inferring ratings from reviews. Through this use case, we explore the intricate relationship between the performance and interpretability of different models. We introduce a quantitative score called Composite Interpretability (CI) to help visualize the trade-off between interpretability and performance, particularly in the case of composite models. Our results indicate that, in general, the learning performance improves as interpretability decreases, but this relationship is not strictly monotonic, and there are instances where interpretable models are more advantageous.
