Unveiling Dual Quality in Product Reviews: An NLP-Based Approach
Rafał Poświata, Marcin Michał Mirończuk, Sławomir Dadas, Małgorzata Grębowiec, Michał Perełkiewicz
TL;DR
This study tackles the dual quality problem in product reviews using NLP by building a Polish dataset of 1,957 reviews (including 540 dual-quality cases) and evaluating diverse models from SetFit to large language models. It demonstrates that language-specific transformer encoders can rival or exceed LLMs, and that prompt-based improvements for LLMs are not guaranteed. The work provides a practical deployment path for a Polish consumer-protection agency, with multilingual transfer assessed and guidance on model selection and integration. Overall, it offers a complete workflow from data collection and labeling to model evaluation, robustness checks, and deployment considerations with clear regulatory relevance.
Abstract
Consumers often face inconsistent product quality, particularly when identical products vary between markets, a situation known as the dual quality problem. To identify and address this issue, automated techniques are needed. This paper explores how natural language processing (NLP) can aid in detecting such discrepancies and presents the full process of developing a solution. First, we describe in detail the creation of a new Polish-language dataset with 1,957 reviews, 540 highlighting dual quality issues. We then discuss experiments with various approaches like SetFit with sentence-transformers, transformer-based encoders, and LLMs, including error analysis and robustness verification. Additionally, we evaluate multilingual transfer using a subset of opinions in English, French, and German. The paper concludes with insights on deployment and practical applications.
