Table of Contents
Fetching ...

DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers

Rakesh R. Menon, Shashank Srivastava

Abstract

Despite their high predictive accuracies, current machine learning systems often exhibit systematic biases stemming from annotation artifacts or insufficient support for certain classes in the dataset. Recent work proposes automatic methods for identifying and explaining systematic biases using keywords. We introduce DISCERN, a framework for interpreting systematic biases in text classifiers using language explanations. DISCERN iteratively generates precise natural language descriptions of systematic errors by employing an interactive loop between two large language models. Finally, we use the descriptions to improve classifiers by augmenting classifier training sets with synthetically generated instances or annotated examples via active learning. On three text-classification datasets, we demonstrate that language explanations from our framework induce consistent performance improvements that go beyond what is achievable with exemplars of systematic bias. Finally, in human evaluations, we show that users can interpret systematic biases more effectively (by over 25% relative) and efficiently when described through language explanations as opposed to cluster exemplars.

DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers

Abstract

Despite their high predictive accuracies, current machine learning systems often exhibit systematic biases stemming from annotation artifacts or insufficient support for certain classes in the dataset. Recent work proposes automatic methods for identifying and explaining systematic biases using keywords. We introduce DISCERN, a framework for interpreting systematic biases in text classifiers using language explanations. DISCERN iteratively generates precise natural language descriptions of systematic errors by employing an interactive loop between two large language models. Finally, we use the descriptions to improve classifiers by augmenting classifier training sets with synthetically generated instances or annotated examples via active learning. On three text-classification datasets, we demonstrate that language explanations from our framework induce consistent performance improvements that go beyond what is achievable with exemplars of systematic bias. Finally, in human evaluations, we show that users can interpret systematic biases more effectively (by over 25% relative) and efficiently when described through language explanations as opposed to cluster exemplars.

Paper Structure

This paper contains 39 sections, 9 figures, 15 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of our classifier debugging framework, DiScErN. The framework comprises four stages: (1) clustering validation set examples to identify data sub-populations where the classifier makes most errors, (2) cluster description generation using an explainer LLM, (3) refining cluster descriptions through interaction between the explainer and evaluator for higher precision, and (4) model refinement through dataset aggregation.
  • Figure 2: Example of descriptions generated by DiScErN and DiScErN-F for an underperforming cluster in the AGNews dataset. Examples for descriptions with other datasets can be found in the Appendix.
  • Figure 3: Average accuracy of distilbert-base-uncased classifiers after augmenting the training set with examples identified and annotated from a large unlabeled pool using different approaches. Shaded regions indicate the standard deviation over five runs.
  • Figure 4: Zero-shot performance of different language models used as predicate evaluators for our task.
  • Figure 5: Descriptions generated using DiScErN-F and DiScErN for erroneous clusters in different datasets using the distilbert-base-uncased classifier.
  • ...and 4 more figures