Table of Contents
Fetching ...

Learning to Reason via Mixture-of-Thought for Logical Reasoning

Tong Zheng, Lichang Chen, Simeng Han, R. Thomas McCoy, Heng Huang

TL;DR

This work tackles the limitation of single-modality reasoning in LLMs by introducing Mixture-of-Thought (MoT), which jointly trains and infers across natural language, code, and a novel truth-table modality for logic problems. MoT uses a self-evolving training loop with a tailored reward to improve reasoning traces in all modalities and a majority-vote inference mechanism to combine their strengths. Across FOLIO and ProofWriter, MoT yields substantial gains over single-modality CoT baselines, with up to 11.7 percentage points of average accuracy and notable gains on harder problems; a 9B MoT model matches the performance of larger closed models on FOLIO. The results demonstrate that modality-level diversity and cross-modal training unlock complementary reasoning capabilities, offering a practical path to stronger open-source reasoning systems and more robust generalization in complex logical tasks.

Abstract

Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.

Learning to Reason via Mixture-of-Thought for Logical Reasoning

TL;DR

This work tackles the limitation of single-modality reasoning in LLMs by introducing Mixture-of-Thought (MoT), which jointly trains and infers across natural language, code, and a novel truth-table modality for logic problems. MoT uses a self-evolving training loop with a tailored reward to improve reasoning traces in all modalities and a majority-vote inference mechanism to combine their strengths. Across FOLIO and ProofWriter, MoT yields substantial gains over single-modality CoT baselines, with up to 11.7 percentage points of average accuracy and notable gains on harder problems; a 9B MoT model matches the performance of larger closed models on FOLIO. The results demonstrate that modality-level diversity and cross-modal training unlock complementary reasoning capabilities, offering a practical path to stronger open-source reasoning systems and more robust generalization in complex logical tasks.

Abstract

Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.

Paper Structure

This paper contains 59 sections, 3 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Qwen‑2.5‑7B‑Instruct solves $\simeq$20% of FOLIO and $\simeq$35% of ProofWriter exclusively per paradigm. (b) Code+NL+truth‑table yields higher upper‑bound coverage than code+NL alone Xiong2024HYBRIDMINDMS. (c) In NL modes, invalid‑converse (IC) and missing‑branch (MB) errors comprise $\simeq$66% of failures (CS: commonsense injection; FM: factual misquote). Percentages sum to more than 100% because some cases exhibit multiple error types. We provide illustrative examples in Appendix \ref{['subsec:example_error_types']}
  • Figure 2: Illustration of our MoT Framework. (a) Training phase with three key steps: 1) Rationale Generation where given an initial seed dataset, LLM generates rationales across reasoning modalities (NL, Code, and Truth Table); 2) Quality Checking and Merging where generated rationales are checked for correctness and format consistency, then merged into high-quality MoT training data; and 3) Finetuning where the model is trained using the MoT data. These steps iteratively repeats, forming a self-evolving training cycle. (b) Inference phase: the trained model generates outputs for each reasoning modality and applies majority voting to yield the final prediction (e.g., A).
  • Figure 3: Pass@k vs. Sample Budget on FOLIO. (a) MoT-trained model with MoT sampling outperforms the base model (Gemma-2-9b-It) with SoT sampling. (b) Within the MoT-trained model, MoT sampling yields higher Pass@k than SoT sampling (NL_CoT, Truth Table, Code).
  • Figure 4: Performance comparison of different thought paradigms across reasoning depths. On FOLIO and ProverQA benchmarks, MoT inference exhibits better performance on difficult problems.
  • Figure 5: Accuracy (%) over three self-evolving rounds on the FOLIO benchmark for: distilled NL-CoT (first-round only), raw NL-CoT (no distillation), and MoT (no distillation). The performance is evaluated with NL-based reasoning.
  • ...and 2 more figures