Table of Contents
Fetching ...

APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT

Yiming Zhu, Zhizhuo Yin, Gareth Tyson, Ehsan-Ul Haq, Lik-Hang Lee, Pan Hui

TL;DR

APT-Pipe presents an automated, modular pipeline that tunes prompts for ChatGPT to improve social computing text classification using a small annotated subset. The three-step process—JSON prompt initialization, exemplar-based few-shot tuning, and NLP-metric augmentation—coupled with an iterative metric selection strategy, yields average F1 improvements of 7.01% across 12 datasets and high output parsability. The framework is extended with CoT and ToT demonstrations, showing potential further gains and illustrating the system’s extensibility. Overall, the work reduces manual prompt engineering while delivering practical gains in accuracy and efficiency for ChatGPT-based annotation tasks in social computing.

Abstract

Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.

APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT

TL;DR

APT-Pipe presents an automated, modular pipeline that tunes prompts for ChatGPT to improve social computing text classification using a small annotated subset. The three-step process—JSON prompt initialization, exemplar-based few-shot tuning, and NLP-metric augmentation—coupled with an iterative metric selection strategy, yields average F1 improvements of 7.01% across 12 datasets and high output parsability. The framework is extended with CoT and ToT demonstrations, showing potential further gains and illustrating the system’s extensibility. Overall, the work reduces manual prompt engineering while delivering practical gains in accuracy and efficiency for ChatGPT-based annotation tasks in social computing.

Abstract

Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
Paper Structure (24 sections, 3 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Flow chart of Step 1. Prompt on the right-hand side shows example text for clickbait news headline detection.
  • Figure 2: Flow chart of Step 2 (prompt-tuning with few-shot learning).
  • Figure 3: Flow chart of Step 3 (prompt-tuning with NLP metrics).