From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items

Melissa Roemmele; Andrew S. Gordon

From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items

Melissa Roemmele, Andrew S. Gordon

TL;DR

This paper considers LLMs as authors of commonsense assessment items, and finds that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.

Abstract

LLMs can now perform a variety of complex writing tasks. They also excel in answering questions pertaining to natural language inference and commonsense reasoning. Composing these questions is itself a skilled writing task, so in this paper we consider LLMs as authors of commonsense assessment items. We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning, the Choice of Plausible Alternatives (COPA). We examine the outcome according to analyses facilitated by the LLMs and human annotation. We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.

From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items

TL;DR

Abstract

From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items

TL;DR

Abstract

Paper Structure

Table of Contents