Table of Contents
Fetching ...

Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?

David Novoa-Paradela, Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas

TL;DR

The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition and is accompanied by a normality score and an explanation that justifies the decision made.

Abstract

This paper presents a pipeline to detect and explain anomalous reviews in online platforms. The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition. The classifications are accompanied by a normality score and an explanation that justifies the decision made. The pipeline's ability to solve the anomaly detection task was evaluated using different datasets created from a large Amazon database. Additionally, a study comparing three explainability techniques involving 241 participants was conducted to assess the explainability module. The study aimed to measure the impact of explanations on the respondents' ability to reproduce the classification model and their perceived usefulness. This work can be useful to automate tasks in review online platforms, such as those for electronic commerce, and offers inspiration for addressing similar problems in the field of anomaly detection in textual data. We also consider it interesting to have carried out a human evaluation of the capacity of different explainability techniques in a real and infrequent scenario such as the detection of anomalous reviews, as well as to reflect on whether it is possible to explain tasks as humanly subjective as this one.

Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?

TL;DR

The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition and is accompanied by a normality score and an explanation that justifies the decision made.

Abstract

This paper presents a pipeline to detect and explain anomalous reviews in online platforms. The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition. The classifications are accompanied by a normality score and an explanation that justifies the decision made. The pipeline's ability to solve the anomaly detection task was evaluated using different datasets created from a large Amazon database. Additionally, a study comparing three explainability techniques involving 241 participants was conducted to assess the explainability module. The study aimed to measure the impact of explanations on the respondents' ability to reproduce the classification model and their perceived usefulness. This work can be useful to automate tasks in review online platforms, such as those for electronic commerce, and offers inspiration for addressing similar problems in the field of anomaly detection in textual data. We also consider it interesting to have carried out a human evaluation of the capacity of different explainability techniques in a real and infrequent scenario such as the detection of anomalous reviews, as well as to reflect on whether it is possible to explain tasks as humanly subjective as this one.
Paper Structure (18 sections, 5 equations, 6 figures, 10 tables)

This paper contains 18 sections, 5 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: General modules that form the proposed pipeline.
  • Figure 2: Proposed pipeline considering the product "chocolate bars" as the normal class.
  • Figure 3: Explanation generated by SHAP for the anomalous reviews detection problem raised in this work. In this scenario, the reviews to be analysed correspond to the product "chocolate bars" (Bars). The review of this example was correctly classified as normal. The most influential terms in its classification as normal are marked in red, while the terms that promote the opposite class are highlighted in blue, in this case practically none since we are dealing with an obvious case. Greater intensity implies greater influence. In this case, the review terms that have most influenced its classification as normal are "snack" and "tasting".
  • Figure 4: Initial prompt to present the anomaly detection problem to be solved to GPT-3.
  • Figure 5: Prompt in which the classification of a review is explained by GPT-3. The product considered as normal is chocolate bars and the review has been classified as normal by the anomaly detection model.
  • ...and 1 more figures