Prompting Implicit Discourse Relation Annotation
Frances Yung, Mansoor Ahmad, Merel Scholman, Vera Demberg
TL;DR
This work interrogates whether prompting strategies can enable zero-/few-shot implicit DR annotation with GPT-4, focusing on breaking a 14-way DR classification into smaller tasks via two-step DC insertion, per-class binary prompts, and per-class verification prompts. Across PDTB 3.0 and DiscoGeM datasets, GPT-4’s implicit DR predictions show limited gains and remain substantially behind supervised state-of-the-art, even with sophisticated prompt designs. The results indicate that implicit DR recognition may not be solvable under zero-/few-shot settings without explicit supervision or additional signals, though some prompts offer multi-label annotation potential and insights into prompt-dependent behavior. The study highlights the gap between prompt-driven reasoning and the need for task-specific supervision in fine-grained linguistic classification, with implications for crowd-sourced annotation pipelines and future prompt engineering efforts.
Abstract
Pre-trained large language models, such as ChatGPT, archive outstanding performance in various reasoning tasks without supervised training and were found to have outperformed crowdsourcing workers. Nonetheless, ChatGPT's performance in the task of implicit discourse relation classification, prompted by a standard multiple-choice question, is still far from satisfactory and considerably inferior to state-of-the-art supervised approaches. This work investigates several proven prompting techniques to improve ChatGPT's recognition of discourse relations. In particular, we experimented with breaking down the classification task that involves numerous abstract labels into smaller subtasks. Nonetheless, experiment results show that the inference accuracy hardly changes even with sophisticated prompt engineering, suggesting that implicit discourse relation classification is not yet resolvable under zero-shot or few-shot settings.
