Brainstorming Brings Power to Large Language Models of Knowledge Reasoning
Zining Qin, Chenhao Wang, Huiling Qin, Weijia Jia
TL;DR
This work tackles the challenge of reasoning bias and instability from single-model outputs by introducing a prompt-based multi-model brainstorming framework that aggregates diverse LLM perspectives through iterative reasoning rounds. By defining a consensus mechanism and employing dialog truncation, the approach facilitates robust, interpretable collaboration among heterogeneous models, achieving notable accuracy gains on GSM and ARC tasks and enabling smaller models to approximate larger ones. The findings demonstrate the practical value of cross-model knowledge exchange for knowledge reasoning and point to scalable deployment benefits where distributed, smaller models can collectively match larger capabilities. The proposed method offers a viable path to improve reasoning reliability in real-world deployments while mitigating some of the cost and bias issues associated with single-model inference.
Abstract
Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration. However, models with different capabilities may produce conflicting answers on the same problem, and how to reasonably obtain the correct answer from multiple candidate models has become a challenging problem. In this paper, we propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached within the group. We conducted experiments on three different types of datasets, and demonstrate that the brainstorming can significantly improve the effectiveness in logical reasoning and fact extraction. Furthermore, we find that two small-parameter models can achieve accuracy approximating that of larger-parameter models through brainstorming, which provides a new solution for distributed deployment of LLMs.
