Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks
Yang Wang, Chenghua Lin
TL;DR
This paper tackles the lack of a comprehensive benchmark for textual adversarial defence by introducing an extensive framework that spans multiple NLP datasets and tasks. It evaluates a range of defence methods and proposes TTSO++, an entropy-informed extension of training-time temperature scaling, to improve robustness. The study shows that regularisation-based approaches, particularly TTSO++, achieve state-of-the-art performance with modest runtime overhead across both encoder-based and embedding-based models, and identifies limitations of certain baselines like Flooding-X. The benchmark and findings provide a scalable foundation for developing task-general, efficient adversarial defenses in NLP, with practical implications for deploying robust systems.
Abstract
Recent advancements in natural language processing have highlighted the vulnerability of deep learning models to adversarial attacks. While various defence mechanisms have been proposed, there is a lack of comprehensive benchmarks that evaluate these defences across diverse datasets, models, and tasks. In this work, we address this gap by presenting an extensive benchmark for textual adversarial defence that significantly expands upon previous work. Our benchmark incorporates a wide range of datasets, evaluates state-of-the-art defence mechanisms, and extends the assessment to include critical tasks such as single-sentence classification, similarity and paraphrase identification, natural language inference, and commonsense reasoning. This work not only serves as a valuable resource for researchers and practitioners in the field of adversarial robustness but also identifies key areas for future research in textual adversarial defence. By establishing a new standard for benchmarking in this domain, we aim to accelerate progress towards more robust and reliable natural language processing systems.
