Table of Contents
Fetching ...

Brainstorming Brings Power to Large Language Models of Knowledge Reasoning

Zining Qin, Chenhao Wang, Huiling Qin, Weijia Jia

TL;DR

This work tackles the challenge of reasoning bias and instability from single-model outputs by introducing a prompt-based multi-model brainstorming framework that aggregates diverse LLM perspectives through iterative reasoning rounds. By defining a consensus mechanism and employing dialog truncation, the approach facilitates robust, interpretable collaboration among heterogeneous models, achieving notable accuracy gains on GSM and ARC tasks and enabling smaller models to approximate larger ones. The findings demonstrate the practical value of cross-model knowledge exchange for knowledge reasoning and point to scalable deployment benefits where distributed, smaller models can collectively match larger capabilities. The proposed method offers a viable path to improve reasoning reliability in real-world deployments while mitigating some of the cost and bias issues associated with single-model inference.

Abstract

Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration. However, models with different capabilities may produce conflicting answers on the same problem, and how to reasonably obtain the correct answer from multiple candidate models has become a challenging problem. In this paper, we propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached within the group. We conducted experiments on three different types of datasets, and demonstrate that the brainstorming can significantly improve the effectiveness in logical reasoning and fact extraction. Furthermore, we find that two small-parameter models can achieve accuracy approximating that of larger-parameter models through brainstorming, which provides a new solution for distributed deployment of LLMs.

Brainstorming Brings Power to Large Language Models of Knowledge Reasoning

TL;DR

This work tackles the challenge of reasoning bias and instability from single-model outputs by introducing a prompt-based multi-model brainstorming framework that aggregates diverse LLM perspectives through iterative reasoning rounds. By defining a consensus mechanism and employing dialog truncation, the approach facilitates robust, interpretable collaboration among heterogeneous models, achieving notable accuracy gains on GSM and ARC tasks and enabling smaller models to approximate larger ones. The findings demonstrate the practical value of cross-model knowledge exchange for knowledge reasoning and point to scalable deployment benefits where distributed, smaller models can collectively match larger capabilities. The proposed method offers a viable path to improve reasoning reliability in real-world deployments while mitigating some of the cost and bias issues associated with single-model inference.

Abstract

Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration. However, models with different capabilities may produce conflicting answers on the same problem, and how to reasonably obtain the correct answer from multiple candidate models has become a challenging problem. In this paper, we propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached within the group. We conducted experiments on three different types of datasets, and demonstrate that the brainstorming can significantly improve the effectiveness in logical reasoning and fact extraction. Furthermore, we find that two small-parameter models can achieve accuracy approximating that of larger-parameter models through brainstorming, which provides a new solution for distributed deployment of LLMs.
Paper Structure (12 sections, 12 figures, 3 tables)

This paper contains 12 sections, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Differences in getting answers through voting, strong model judgments (review), and brainstorming of many different types of collaboration on large language models.
  • Figure 2: An illustration of LLM Brainstorming. When multiple models are involved in the inference process, they can discuss with each other and eventually reach a consensus judgment.
  • Figure 3: The accuracy distributions of the three open-source LLMs on different domains of the MMLU dataset. We divide the 57 tasks of MMLU into 7 major categories based on their fields.
  • Figure 4: The accuracy comparison of brainstorming and CoT-base models on different datasets. CoT used in MMLU is 5-shot, 8-shot for GSM and 0-shot for ARC.
  • Figure 5: The accuracy comparison of the small-parameter models-based brainstorming (Qwen1.5-7B, Baichuan2-13B, Mistral-7B) with double-parameter models (Qwen1.5-14B, Baichuan2-13B) and with the MoE model (Mixtral 7 x 8B).
  • ...and 7 more figures