Predicting generalization performance with correctness discriminators

Yuekun Yao; Alexander Koller

Predicting generalization performance with correctness discriminators

Yuekun Yao, Alexander Koller

TL;DR

This work presents a novel model that establishes upper and lower bounds on the accuracy of an NLP model, without requiring gold labels for the unseen data, by training a discriminator which predicts whether the output of a given sequence-to-sequence model is correct or not.

Abstract

The ability to predict an NLP model's accuracy on unseen, potentially out-of-distribution data is a prerequisite for trustworthiness. We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data. We achieve this by training a discriminator which predicts whether the output of a given sequence-to-sequence model is correct or not. We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds, and that these bounds are remarkably close together.