Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Xiaolong Wang, Yile Wang, Yuanchi Zhang, Fuwen Luo, Peng Li, Maosong Sun, Yang Liu
TL;DR
The paper tackles the challenge of subjective reasoning in LLMs, where interpretation and emotion play a central role and traditional chain-of-thought prompts often fall short. It introduces RiC, a tuning-free method that solves subjective tasks via dialogue simulation, comprising keywords extraction, dialogue-based scenario construction, and dialogue-enhanced reasoning, with an optional unified prompting variant. Across twelve subjective datasets and multiple models (GPT-4, ChatGPT, OpenChat), RiC delivers significant improvements in zero-shot and few-shot settings over strong baselines, highlighting the value of dialogue-derived contextual knowledge. The work demonstrates that simulating human-like dialogues can reveal useful information behind questions, offering a scalable and practical approach to improving subjective reasoning in LLMs and guiding future benchmark and domain-specific adaptations.
Abstract
Large Language Models (LLMs) have achieved remarkable performance in objective tasks such as open-domain question answering and mathematical reasoning, which can often be solved through recalling learned factual knowledge or chain-of-thought style reasoning. However, we find that the performance of LLMs in subjective tasks is still unsatisfactory, such as metaphor recognition, dark humor detection, etc. Compared to objective tasks, subjective tasks focus more on interpretation or emotional response rather than a universally accepted reasoning pathway. Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales, thereby offering potential useful knowledge behind dialogues for giving the final answers. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks. Experimental results show that RiC can yield significant improvement compared with various baselines.
