Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation
Jihyun Janice Ahn, Ryo Kamoi, Lu Cheng, Rui Zhang, Wenpeng Yin
TL;DR
The paper tackles the problem of generative uncertainty in large language models (LLMs) by introducing Direct-Inverse Discriminative Prompting, a multi-angle approach using Direct, Inverse, and Combination prompts to select the most certain answers from multiple generations. It evaluates these prompts on two math-focused datasets (MATH and MathQA) across four LLMs (two closed-source, two open-source) and compares against Chain-of-Thought and Universal Self-Consistency baselines. Findings show that discriminative prompts substantially improve performance for closed-source LLMs, with Direct and Inverse often outperforming baselines, while open-source models primarily benefit from Direct Prompt due to negation understanding challenges; MetaMath generally underperforms with these prompts. The results yield practical guidance for deploying discriminative prompting to reduce self-generated uncertainty and highlight avenues for broader evaluation across more models and domains.
Abstract
Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' discriminative capability to reduce this generative uncertainty, aiding in identifying the correct answers. Specifically, we propose and analyze three discriminative prompts: direct, inverse, and hybrid, to explore the potential of both closed-source and open-source LLMs in self-improving their generative performance on two benchmark datasets. Our insights reveal which discriminative prompt is most promising and when to use it. To our knowledge, this is the first work to systematically analyze LLMs' discriminative capacity to address generative uncertainty.
