Table of Contents
Fetching ...

Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation

Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu

TL;DR

This work proposes a structured pipeline to facilitate zero-shot counterfactual generation, and hypothesizes that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning.

Abstract

With the development and proliferation of large, complex, black-box models for solving many natural language processing (NLP) tasks, there is also an increasing necessity of methods to stress-test these models and provide some degree of interpretability or explainability. While counterfactual examples are useful in this regard, automated generation of counterfactuals is a data and resource intensive process. such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets, that may be infeasible to build in practice, especially for new tasks and data domains. Therefore, in this work we explore the possibility of leveraging large language models (LLMs) for zero-shot counterfactual generation in order to stress-test NLP models. We propose a structured pipeline to facilitate this generation, and we hypothesize that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning. Through comprehensive experiments on a variety of propreitary and open-source LLMs, along with various downstream tasks in NLP, we explore the efficacy of LLMs as zero-shot counterfactual generators in evaluating and explaining black-box NLP models.

Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation

TL;DR

This work proposes a structured pipeline to facilitate zero-shot counterfactual generation, and hypothesizes that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning.

Abstract

With the development and proliferation of large, complex, black-box models for solving many natural language processing (NLP) tasks, there is also an increasing necessity of methods to stress-test these models and provide some degree of interpretability or explainability. While counterfactual examples are useful in this regard, automated generation of counterfactuals is a data and resource intensive process. such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets, that may be infeasible to build in practice, especially for new tasks and data domains. Therefore, in this work we explore the possibility of leveraging large language models (LLMs) for zero-shot counterfactual generation in order to stress-test NLP models. We propose a structured pipeline to facilitate this generation, and we hypothesize that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning. Through comprehensive experiments on a variety of propreitary and open-source LLMs, along with various downstream tasks in NLP, we explore the efficacy of LLMs as zero-shot counterfactual generators in evaluating and explaining black-box NLP models.
Paper Structure (26 sections, 4 equations, 3 figures, 2 tables)

This paper contains 26 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Examples of an input sentence and its corresponding counterfactual examples with same or opposite label.
  • Figure 2: Our proposed FIZLE pipeline for zero-shot LLM-guided counterfactual generation for evaluation and explanation of black-box text classifiers.
  • Figure 3: Comparison of LFS and semantic similarity (Sem-sim) for generated counterfactual explanations for AG News (top) and SNLI (bottom). LFS $\%$ scaled to 0-1, higher values for both are better. dv-003 refers to text-davinci-003.