Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks
Dharunish Yugeswardeenoo, Kevin Zhu, Sean O'Brien
TL;DR
Question-Analysis Prompting (QAP) introduces a zero-shot prompting strategy that requires an LLM to explain the question in at least $n$ words before solving, enabling explicit interpretation of the task. Across GPT-3.5 Turbo and GPT-4 Turbo, QAP variants outperform state-of-the-art prompts on AQuA and SAT and achieve top-2 performance in about 75% of tests, with longer explanations helping harder problems but potentially hindering easier ones. The approach emphasizes the model's understanding of the question and demonstrates the importance of response length in reasoning tasks, highlighting a trade-off between depth of analysis and simplicity of the final answer. The work suggests combining QAP with other prompting and decoding strategies and extending to multi-modal tasks, while acknowledging limitations in prompt sensitivity and dataset/model scope.
Abstract
Although LLMs have the potential to transform many fields, they still underperform humans in reasoning tasks. Existing methods induce the model to produce step-by-step calculations, but this research explores the question: Does making the LLM analyze the question improve its performance? We propose a novel prompting strategy called Question Analysis Prompting (QAP), in which the model is prompted to explain the question in $n$ words before solving. The value of $n$ influences the length of response generated by the model. QAP is evaluated on GPT 3.5 Turbo and GPT 4 Turbo on arithmetic datasets GSM8K, AQuA, and SAT and commonsense dataset StrategyQA. QAP is compared with other state-of-the-art prompts including Chain-of-Thought (CoT), Plan and Solve Prompting (PS+) and Take A Deep Breath (TADB). QAP outperforms all state-of-the-art prompts on AQuA and SAT datasets on both GPT3.5 and GPT4. QAP consistently ranks among the top-2 prompts on 75\% of the tests. A key factor of QAP performance can be attributed to response length, where detailed responses are beneficial when answering harder questions, but can negatively affect easy questions.
