Table of Contents
Fetching ...

Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

Junhao Chen, Bowen Wang, Zhouqiang Jiang, Yuta Nakashima

TL;DR

Question answering with black-box LLMs is hindered by vague user questions. The authors propose a single-round instance-level prompt optimization called question rewriter, trained via direct preference optimization using automatic evaluation criteria, thus avoiding costly human annotation. By sampling rewritten questions, constructing better/worse pairs through automatic metrics, and applying DPO, the method improves answer quality across multiple LFQA datasets and model types, with evidence of cross-model generalization. This framework demonstrates practical LFQA prompt optimization without human feedback and establishes a path toward domain-robust question rewriting for real-world QA tasks.

Abstract

Large Language Models (LLMs) have demonstrated significant capabilities, particularly in the domain of question answering (QA). However, their effectiveness in QA is often undermined by the vagueness of user questions. To address this issue, we introduce single-round instance-level prompt optimization, referred to as question rewriter. By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers. The rewriter is optimized using direct preference optimization based on feedback collected from automatic criteria for evaluating generated answers; therefore, its training does not require costly human annotations. The experiments across multiple black-box LLMs and long-form question answering (LFQA) datasets demonstrate the efficacy of our method. This paper provides a practical framework for training question rewriters and sets a precedent for future explorations in prompt optimization within LFQA tasks. Code is available at https://github.com/3244we/Question-Rewriter.

Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

TL;DR

Question answering with black-box LLMs is hindered by vague user questions. The authors propose a single-round instance-level prompt optimization called question rewriter, trained via direct preference optimization using automatic evaluation criteria, thus avoiding costly human annotation. By sampling rewritten questions, constructing better/worse pairs through automatic metrics, and applying DPO, the method improves answer quality across multiple LFQA datasets and model types, with evidence of cross-model generalization. This framework demonstrates practical LFQA prompt optimization without human feedback and establishes a path toward domain-robust question rewriting for real-world QA tasks.

Abstract

Large Language Models (LLMs) have demonstrated significant capabilities, particularly in the domain of question answering (QA). However, their effectiveness in QA is often undermined by the vagueness of user questions. To address this issue, we introduce single-round instance-level prompt optimization, referred to as question rewriter. By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers. The rewriter is optimized using direct preference optimization based on feedback collected from automatic criteria for evaluating generated answers; therefore, its training does not require costly human annotations. The experiments across multiple black-box LLMs and long-form question answering (LFQA) datasets demonstrate the efficacy of our method. This paper provides a practical framework for training question rewriters and sets a precedent for future explorations in prompt optimization within LFQA tasks. Code is available at https://github.com/3244we/Question-Rewriter.
Paper Structure (28 sections, 9 equations, 4 figures, 9 tables)

This paper contains 28 sections, 9 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: The original questions posed by the user are difficult for black-box LLMs to understand, resulting in poor answers. However, when the questions are rewritten by the rewriter, they become easier for LLMs to comprehend, leading to better answers.
  • Figure 2: Pipeline of our method.
  • Figure 3: Evaluating the impact of $N_+$ and $N_-$ on the performance over K-QA and OASST1QA.
  • Figure 4: The importance and impact of attributes conciseness (Conc.), structure (Strt.), word choice (WC), emotion (Emo.), non-leadingness (NL), grammar and spelling (G&S), neutrality (Neut.), tone, clarity (Clar.), and politeness (Pol.). The bar plots are important, while the line plots are impact.