Table of Contents
Fetching ...

Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning

Yanan Chen, Ali Pesaranghader, Tanmana Sadhu

TL;DR

The paper tackles the reliability gap in mathematical reasoning for small-to-medium open LLMs and proposes Mixture of Opinions (MoO), a post-training approach that augments a stronger main LLM with Chain-of-Thought reasoning and answers from weaker ancillary LLMs. MoO operates in three phases: data curation with external opinions, post-training the main model on this MoO data, and inference that leverages ancillary opinions for final answers. Experiments on GSM8K, AQuA-RAT, and MATH show MoO outperforms strong baselines such as ICL, SFT, and MoA across configurations, with ablations confirming the importance of CoT and ancillary diversity. The work demonstrates that diverse reasoning paths from weaker models can meaningfully enhance multi-step mathematical reasoning and offers a practical, model-agnostic post-training strategy for improving LLM reasoning.

Abstract

Recent advances in Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics. While closed LLMs like GPT-4 perform well on mathematical benchmarks, e.g., GSM8K, it remains unclear whether small to medium-sized open LLMs can achieve similar performance, questioning their reliability. To close this gap, we propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning. For that, each post-training sample is augmented with Chain-of-Thought (CoT) reasoning steps and answers from ancillary LLMs, enabling the main LLM to learn from diverse perspectives. We compare MoO with standard supervised fine-tuning (SFT), few-shot prompting, and the Mixture of Agents (MoA) method on mathematical reasoning benchmarks. Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.

Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning

TL;DR

The paper tackles the reliability gap in mathematical reasoning for small-to-medium open LLMs and proposes Mixture of Opinions (MoO), a post-training approach that augments a stronger main LLM with Chain-of-Thought reasoning and answers from weaker ancillary LLMs. MoO operates in three phases: data curation with external opinions, post-training the main model on this MoO data, and inference that leverages ancillary opinions for final answers. Experiments on GSM8K, AQuA-RAT, and MATH show MoO outperforms strong baselines such as ICL, SFT, and MoA across configurations, with ablations confirming the importance of CoT and ancillary diversity. The work demonstrates that diverse reasoning paths from weaker models can meaningfully enhance multi-step mathematical reasoning and offers a practical, model-agnostic post-training strategy for improving LLM reasoning.

Abstract

Recent advances in Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics. While closed LLMs like GPT-4 perform well on mathematical benchmarks, e.g., GSM8K, it remains unclear whether small to medium-sized open LLMs can achieve similar performance, questioning their reliability. To close this gap, we propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning. For that, each post-training sample is augmented with Chain-of-Thought (CoT) reasoning steps and answers from ancillary LLMs, enabling the main LLM to learn from diverse perspectives. We compare MoO with standard supervised fine-tuning (SFT), few-shot prompting, and the Mixture of Agents (MoA) method on mathematical reasoning benchmarks. Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.

Paper Structure

This paper contains 19 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The Mixture of Opinions (MoO) Framework: During the first phase, we curate a post-training set by augmenting $m$ training examples with opinions collected from the ancillary and main LLMs. On the right, we show the second phase where we fine-tune the main LLM with the curated MoO dataset. For inference, the post-trained main LLM is used to predict an answer for a given question along with collected opinions from the ancillary LLMs.