Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction

Saurabh Srivastava; Ziyu Yao

Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction

Saurabh Srivastava, Ziyu Yao

TL;DR

The paper presents the first systematic study of prompt optimization for Large Reasoning Models (LRMs) using end-to-end event extraction as a case study. By evaluating LRMs and general-purpose LLMs as both task models and prompt optimizers within a Monte Carlo Tree Search framework, the authors show that LRMs gain substantially from prompt optimization and often outperform LLMs, even when tuned as optimizers. The results generalize beyond event extraction to tasks like Geometric Shapes and NCBI Disease NER, where LRMs similarly excel as optimizers. An error analysis reveals that LRM-optimized prompts reduce common extraction errors and that LRMs provide faster, more stable convergence in optimization, highlighting their potential as both consumers and producers of high-quality prompts across diverse tasks.

Abstract

Large Reasoning Models (LRMs) such as DeepSeek-R1 and OpenAI o1 have demonstrated remarkable capabilities in various reasoning tasks. Their strong capability to generate and reason over intermediate thoughts has also led to arguments that they may no longer require extensive prompt engineering or optimization to interpret human instructions and produce accurate outputs. In this work, we aim to systematically study this open question, using the structured task of event extraction for a case study. We experimented with two LRMs (DeepSeek-R1 and o1) and two general-purpose Large Language Models (LLMs) (GPT-4o and GPT-4.5), when they were used as task models or prompt optimizers. Our results show that on tasks as complicated as event extraction, LRMs as task models still benefit from prompt optimization, and that using LRMs as prompt optimizers yields more effective prompts. Our finding also generalizes to tasks beyond event extraction. Finally, we provide an error analysis of common errors made by LRMs and highlight the stability and consistency of LRMs in refining task instructions and event guidelines.

Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction

TL;DR

Abstract

Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)