Large Language Models are Contrastive Reasoners
Liang Yao
TL;DR
The paper introduces Contrastive Prompting (CP), a template-based prompting approach that elicits both a correct and an incorrect answer to guide large language model reasoning without task-specific labeled demonstrations. It implements a two-stage process—reasoning extraction followed by answer extraction—via self-augmented prompts, and can be integrated with existing prompting methods (X-CP). Empirical results across GPT-4, GPT-3.5-Turbo, and open LLMs show substantial improvements over zero-shot and often over zero-shot-CoT on arithmetic, commonsense, and symbolic tasks, with notable gains on GSM8K and AQUA-RAT; CP can approach or surpass state-of-the-art results when combined with other prompting strategies. The method is simple to deploy, scales across models, and comes with available code, making contrastive reasoning more accessible and practical for a wide range of tasks. The work also discusses prompt-template effects and explores the impact of the number of generated incorrect answers, highlighting both strengths and limitations and outlining directions for future research on smaller models and deeper integration with advanced prompting techniques.
Abstract
Prompting methods play a crucial role in enhancing the capabilities of pre-trained large language models (LLMs). We explore how contrastive prompting (CP) significantly improves the ability of large language models to perform complex reasoning. We demonstrate that LLMs are decent contrastive reasoners by simply adding "Let's give a correct and a wrong answer." before LLMs provide answers. Experiments on various large language models show that zero-shot contrastive prompting improves the performance of standard zero-shot prompting on a range of arithmetic, commonsense, and symbolic reasoning tasks without any hand-crafted few-shot examples, such as increasing the accuracy on GSM8K from 35.9% to 88.8% and AQUA-RAT from 41.3% to 62.2% with the state-of-the-art GPT-4 model. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods, resulting in improved or comparable results when compared to state-of-the-art methods. Our code is available at https://github.com/yao8839836/cp
