Table of Contents
Fetching ...

Enhancing Disinformation Detection with Explainable AI and Named Entity Replacement

Santiago González-Silot, Andrés Montoro-Montarroso, Eugenio Martínez Cámara, Juan Gómez-Romero

TL;DR

This research work hypothesise that text classification methods are not able to capture the nuances of disinformation and they often ground their decision in superfluous features, so a post-hoc explainability method (SHAP, SHapley Additive exPlanations) is applied to identify spurious elements with high impact on the classification models.

Abstract

The automatic detection of disinformation presents a significant challenge in the field of natural language processing. This task addresses a multifaceted societal and communication issue, which needs approaches that extend beyond the identification of general linguistic patterns through data-driven algorithms. In this research work, we hypothesise that text classification methods are not able to capture the nuances of disinformation and they often ground their decision in superfluous features. Hence, we apply a post-hoc explainability method (SHAP, SHapley Additive exPlanations) to identify spurious elements with high impact on the classification models. Our findings show that non-informative elements (e.g., URLs and emoticons) should be removed and named entities (e.g., Rwanda) should be pseudo-anonymized before training to avoid models' bias and increase their generalization capabilities. We evaluate this methodology with internal dataset and external dataset before and after applying extended data preprocessing and named entity replacement. The results show that our proposal enhances on average the performance of a disinformation classification method with external test data in 65.78% without a significant decrease of the internal test performance.

Enhancing Disinformation Detection with Explainable AI and Named Entity Replacement

TL;DR

This research work hypothesise that text classification methods are not able to capture the nuances of disinformation and they often ground their decision in superfluous features, so a post-hoc explainability method (SHAP, SHapley Additive exPlanations) is applied to identify spurious elements with high impact on the classification models.

Abstract

The automatic detection of disinformation presents a significant challenge in the field of natural language processing. This task addresses a multifaceted societal and communication issue, which needs approaches that extend beyond the identification of general linguistic patterns through data-driven algorithms. In this research work, we hypothesise that text classification methods are not able to capture the nuances of disinformation and they often ground their decision in superfluous features. Hence, we apply a post-hoc explainability method (SHAP, SHapley Additive exPlanations) to identify spurious elements with high impact on the classification models. Our findings show that non-informative elements (e.g., URLs and emoticons) should be removed and named entities (e.g., Rwanda) should be pseudo-anonymized before training to avoid models' bias and increase their generalization capabilities. We evaluate this methodology with internal dataset and external dataset before and after applying extended data preprocessing and named entity replacement. The results show that our proposal enhances on average the performance of a disinformation classification method with external test data in 65.78% without a significant decrease of the internal test performance.

Paper Structure

This paper contains 14 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The outline of our methodology to enhance disinformation detection.
  • Figure 2: Two examples of sentences analyzed with SHAP. Words identified as more relevant to the "true new" category are highlighted in red, while those associated with the "fake news" category are highlighted in blue. The color intensity reflects the level of each word's importance to the model's prediction for that particular sentence.
  • Figure 3: SHAP Global Bar plot, a global feature importance plot, where the global importance of each word is taken to be the mean (or sum) absolute value for that word over all the dataset.
  • Figure 4: Two examples of local explanations using SHAP after applying the methodology proposed in this paper. Words identified as more relevant to the "true news" category are highlighted in red, while those associated with the "fake news" category are highlighted in blue. The color intensity reflects the level of each word's importance to the model's prediction for that particular sentence.
  • Figure 5: SHAP Global Bar plot (sum) after applying the methodology proposed in this paper.
  • ...and 1 more figures