OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation

Włodzimierz Lewoniewski; Piotr Stolarski; Milena Stróżyna; Elzbieta Lewańska; Aleksandra Wojewoda; Ewelina Księżniak; Marcin Sawiński

OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation

Włodzimierz Lewoniewski, Piotr Stolarski, Milena Stróżyna, Elzbieta Lewańska, Aleksandra Wojewoda, Ewelina Księżniak, Marcin Sawiński

TL;DR

OpenFact at CheckThat! 2024 investigates combining multiple adversarial attack methods to generate effective adversarial text for credibility assessment. The authors develop an ensemble framework consisting of BAm, GSWSE, CLARE, and Genetic-based variants, evaluated across five misinformation domains on three victim models using the BODEGA score. They show that ensemble attacks yield substantial improvements over baselines, with CLARE often providing the strongest performance, while RoBERTa remains relatively harder to attack; manual evaluation reveals a gap between automatic semantic preservation scores and human judgments. The work highlights the potential of ensemble adversarial attacks to stress-test credibility assessment systems and outlines directions involving larger text sources and LLM-based hybrids.

Abstract

This paper presents the experiments and results for the CheckThat! Lab at CLEF 2024 Task 6: Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE). The primary objective of this task was to generate adversarial examples in five problem domains in order to evaluate the robustness of widely used text classification methods (fine-tuned BERT, BiLSTM, and RoBERTa) when applied to credibility assessment issues. This study explores the application of ensemble learning to enhance adversarial attacks on natural language processing (NLP) models. We systematically tested and refined several adversarial attack methods, including BERT-Attack, Genetic algorithms, TextFooler, and CLARE, on five datasets across various misinformation tasks. By developing modified versions of BERT-Attack and hybrid methods, we achieved significant improvements in attack effectiveness. Our results demonstrate the potential of modification and combining multiple methods to create more sophisticated and effective adversarial attack strategies, contributing to the development of more robust and secure systems.

OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation

TL;DR

Abstract

OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)