Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection
Steven Fincke, Adrien Bibal, Elizabeth Boschee
TL;DR
This work addresses confidence estimation in few-shot event detection using large language models by introducing License to Speculate and Opportunity (L&O) prompting. L&O expands prompts to elicit guesses, explanations, and a 1–5 confidence rating from GPT-4, without model fine-tuning or access to internal statistics. The approach yields usable confidence measures and improves F1 on select BETTER ontology topics, achieving ROC AUC up to $0.759$ and demonstrating the value of explanations for calibration. The results suggest that explicitly enabling speculation and justification in prompts can make LLM-based annotation pipelines more reliable and scalable for ontology development and silver-data generation.
Abstract
Large Language Models (LLMs) such as GPT-4 have shown enough promise in the few-shot learning context to suggest use in the generation of "silver" data and refinement of new ontologies through iterative application and review. Such workflows become more effective with reliable confidence estimation. Unfortunately, confidence estimation is a documented weakness of models such as GPT-4, and established methods to compensate require significant additional complexity and computation. The present effort explores methods for effective confidence estimation with GPT-4 with few-shot learning for event detection in the BETTER ontology as a vehicle. The key innovation is expanding the prompt and task presented to GPT-4 to provide License to speculate when unsure and Opportunity to quantify and explain its uncertainty (L&O). This approach improves accuracy and provides usable confidence measures (0.759 AUC) with no additional machinery.
