AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

Autumn Toney-Wails; Christian Schoeberl; James Dunham

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

Autumn Toney-Wails, Christian Schoeberl, James Dunham

TL;DR

The results indicate that with effective prompt engineering, chatbots can be used as reliable data annotators even where subject-area expertise is required and the utility of chatbot-annotated datasets on downstream classification tasks is evaluated.

Abstract

Identifying scientific publications that are within a dynamic field of research often requires costly annotation by subject-matter experts. Resources like widely-accepted classification criteria or field taxonomies are unavailable for a domain like artificial intelligence (AI), which spans emerging topics and technologies. We address these challenges by inferring a functional definition of AI research from existing expert labels, and then evaluating state-of-the-art chatbot models on the task of expert data annotation. Using the arXiv publication database as ground-truth, we experiment with prompt engineering for GPT chatbot models to identify an alternative, automated expert annotation pipeline that assigns AI labels with 94% accuracy. For comparison, we fine-tune SPECTER, a transformer language model pre-trained on scientific publications, that achieves 96% accuracy (only 2% higher than GPT) on classifying AI publications. Our results indicate that with effective prompt engineering, chatbots can be used as reliable data annotators even where subject-area expertise is required. To evaluate the utility of chatbot-annotated datasets on downstream classification tasks, we train a new classifier on GPT-labeled data and compare its performance to the arXiv-trained model. The classifier trained on GPT-labeled data outperforms the arXiv-trained model by nine percentage points, achieving 82% accuracy.

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

TL;DR

Abstract

Paper Structure (21 sections, 4 figures, 8 tables)

This paper contains 21 sections, 4 figures, 8 tables.

Introduction
Background and Motivation
Defining and Identifying AI Research
Large Language Models as Expert Annotators
Experimental Design
Scientific Publication Classifier
Data Annotation Prompt Engineering
Classifier Performance Evaluation
Scholarly Literature Datasets
arXiv Dataset
OpenAlex Dataset
Results and Evaluation
AI-arXiv Classifier
GPT for Data Annotation
AI-GPT Classifier
...and 6 more sections

Figures (4)

Figure 1: Chatbot annotation experimental framework diagram.
Figure 2: Number of AI arXiv by publication year. Data accessed on 10-13-2022, thus 2022 is incomplete.
Figure 3: GPT annotation example prompts and response.
Figure 4: Median predicted probability of relevance by GPT model across classification types.

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

TL;DR

Abstract

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

Authors

TL;DR

Abstract

Table of Contents

Figures (4)