A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

Samuel Ackerman; Ella Rabinovich; Eitan Farchi; Ateret Anaby-Tavor

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

Samuel Ackerman, Ella Rabinovich, Eitan Farchi, Ateret Anaby-Tavor

TL;DR

A novel metric for assessing a model robustness is proposed, and its benefits in the non-adversarial scenario are demonstrated by empirical evaluation of several models on the created datasets.

Abstract

We evaluate the robustness of several large language models on multiple datasets. Robustness here refers to the relative insensitivity of the model's answers to meaning-preserving variants of their input. Benchmark datasets are constructed by introducing naturally-occurring, non-malicious perturbations, or by generating semantically equivalent paraphrases of input questions or statements. We further propose a novel metric for assessing a model robustness, and demonstrate its benefits in the non-adversarial scenario by empirical evaluation of several models on the created datasets.

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

TL;DR

A novel metric for assessing a model robustness is proposed, and its benefits in the non-adversarial scenario are demonstrated by empirical evaluation of several models on the created datasets.

Abstract

Paper Structure (22 sections, 2 equations, 3 figures, 5 tables)

This paper contains 22 sections, 2 equations, 3 figures, 5 tables.

Introduction
Datasets
Dataset Description
Expanding Datasets with Perturbations
Superficial (S)
Paraphrase (P)
Distraction (D)
Quantifying Model Robustness
Performance Drop Rate (PDR)
Cohen's $h$ Effect Size
Benchmarking Model Robustness
Experimental Setup
Experimental Results
Model Robustness vs Performance
Robustness Evaluation by Perturbation Type
...and 7 more sections

Figures (3)

Figure 1: Comparison of normalized Cohen's $h$ ($\tilde{\textrm{H}}$) and reverse PDR (=$-1{\times}\textrm{PDR}$) when the original instance accuracy $score_i^o{=}1.0$ (as in the tasks in our study -- binary evaluation outcome: 0 or 1).
Figure 2: Mean model accuracy on original datasets vs its undirectional robustness. x-axis: the higher, the better performing; y-axis: the lower, the more robust.
Figure 3: Mean metric scores by model and dataset. Red error bars show a 95% bootstrapped confidence interval.

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

TL;DR

Abstract

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

Authors

TL;DR

Abstract

Table of Contents

Figures (3)