Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering
Mingxu Tao, Dongyan Zhao, Yansong Feng
TL;DR
This paper introduces Chain-of-Discussion (CoD), a multi-model framework that enables multiple open-source LLMs to summarize, criticize, and revise each other's outputs to improve complex evidence-based QA. By combining two stages—question analysis and evidence analysis—with a structured critique and revision loop, CoD enhances correctness and comprehensiveness, particularly in legal consultation tasks. The authors provide a Chinese civil-law dataset (200 questions) and show that CoD yields improvements on evidence-centric metrics and human evaluations, though performance varies by model size and capability. The work demonstrates the viability of collaborative reasoning among small LLMs to mitigate hallucination and broaden scenario coverage, offering a practical path toward reliable, evidence-grounded open-source QA systems.
Abstract
Open-ended question answering requires models to find appropriate evidence to form wellreasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different focuses, but are still sub-optimal in terms of reliable evidence selection and in-depth question analysis. In this paper, we propose a novel Chain-ofDiscussion framework to leverage the synergy among multiple open-source LLMs aiming to provide more correct and more comprehensive answers for open-ended QA, although they are not strong enough individually. Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
