Table of Contents
Fetching ...

Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization

Mert Esencan, Tarun Advaith Kumar, Ata Akbari Asanjan, P. Aaron Lott, Masoud Mohseni, Can Unlu, Davide Venturelli, Alan Ho

TL;DR

The paper tackles the challenge that autonomous reasoning in large language models (LLMs) remains limited and prompts a need for automated, scalable prompting strategies. It introduces Combinatorial Reasoning (CR), a fully automated pipeline that samples candidate reasoning pieces from an LLM, encodes their relations into a Quadratic Unconstrained Binary Optimization (QUBO) problem, and uses Ising-machine solvers or related probabilistic optimizers to select a subset of reasons for a Chain-of-Thought (CoT) style final prompt. The authors formalize the QUBO construction (with $H=-(\tilde{L}+Q)$ and binary encodings $z_i= \sum_{w=0}^{W-1} 2^w x_{iw}$), demonstrate a sampling-and-optimization workflow, and validate CR on the BIG-Bench Hard (BBH) reasoning suite, showing improved average CoT performance over zero-shot and USP baselines and competitive human-level assessment on some tasks. They also discuss hardware-accelerated solvers (e.g., Digital Annealer) and potential integrations with theorem provers and retrieval-augmented generation, highlighting CR as a promising route to automated, scalable enhancement of AI reasoning in real-world knowledge tasks.

Abstract

Recent Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence and are a significant step towards human-like artificial intelligence (AI). Yet the performance of LLMs at reasoning tasks have been subpar and the reasoning capability of LLMs is a matter of significant debate. While it has been shown that the choice of the prompting technique to the LLM can alter its performance on a multitude of tasks, including reasoning, the best performing techniques require human-made prompts with the knowledge of the tasks at hand. We introduce a framework for what we call Combinatorial Reasoning (CR), a fully-automated prompting method, where reasons are sampled from an LLM pipeline and mapped into a Quadratic Unconstrained Binary Optimization (QUBO) problem. The framework investigates whether QUBO solutions can be profitably used to select a useful subset of the reasons to construct a Chain-of-Thought style prompt. We explore the acceleration of CR with specialized solvers. We also investigate the performance of simpler zero-shot strategies such as linear majority rule or random selection of reasons. Our preliminary study indicates that coupling a combinatorial solver to generative AI pipelines is an interesting avenue for AI reasoning and elucidates design principles for future CR methods.

Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization

TL;DR

The paper tackles the challenge that autonomous reasoning in large language models (LLMs) remains limited and prompts a need for automated, scalable prompting strategies. It introduces Combinatorial Reasoning (CR), a fully automated pipeline that samples candidate reasoning pieces from an LLM, encodes their relations into a Quadratic Unconstrained Binary Optimization (QUBO) problem, and uses Ising-machine solvers or related probabilistic optimizers to select a subset of reasons for a Chain-of-Thought (CoT) style final prompt. The authors formalize the QUBO construction (with and binary encodings ), demonstrate a sampling-and-optimization workflow, and validate CR on the BIG-Bench Hard (BBH) reasoning suite, showing improved average CoT performance over zero-shot and USP baselines and competitive human-level assessment on some tasks. They also discuss hardware-accelerated solvers (e.g., Digital Annealer) and potential integrations with theorem provers and retrieval-augmented generation, highlighting CR as a promising route to automated, scalable enhancement of AI reasoning in real-world knowledge tasks.

Abstract

Recent Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence and are a significant step towards human-like artificial intelligence (AI). Yet the performance of LLMs at reasoning tasks have been subpar and the reasoning capability of LLMs is a matter of significant debate. While it has been shown that the choice of the prompting technique to the LLM can alter its performance on a multitude of tasks, including reasoning, the best performing techniques require human-made prompts with the knowledge of the tasks at hand. We introduce a framework for what we call Combinatorial Reasoning (CR), a fully-automated prompting method, where reasons are sampled from an LLM pipeline and mapped into a Quadratic Unconstrained Binary Optimization (QUBO) problem. The framework investigates whether QUBO solutions can be profitably used to select a useful subset of the reasons to construct a Chain-of-Thought style prompt. We explore the acceleration of CR with specialized solvers. We also investigate the performance of simpler zero-shot strategies such as linear majority rule or random selection of reasons. Our preliminary study indicates that coupling a combinatorial solver to generative AI pipelines is an interesting avenue for AI reasoning and elucidates design principles for future CR methods.
Paper Structure (39 sections, 9 equations, 3 figures, 4 tables)

This paper contains 39 sections, 9 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Workflow for Combinatorial Reasoning. The initial prompt is processed by the LLM $N$ times and the answers are filtered through a semantic matching procedure to produce answers with distinct reasons. The ensemble is mapped into a QUBO problem solved by an Ising machine. The final solution determines a set of reasons to be added to the prompt for a final LLM call that determines the final answer.
  • Figure 2: The performance of combinatorial reasoning (CR) against other methods. Human and USP results are reported from the publications for BBH and USP respectively wan_universal_2023suzgun2022challenging. USP is evaluated on a different, but comparable, LLM PaLM 2-M. Table \ref{['tab:main']} presents the cumulative results across BBH for these various tasks. Tasks marked with $\Lambda$ are algorithmic tasks while the others are NLP tasks.
  • Figure 3: Baseline analysis for Quadratic CR (same as main text) with Linear CR and Random Reasons. Overall performance across the ten datasets were Quadratic CR: $65.2\%$, Linear CR: $68.2\%$, Random: $57.4\%$. 0-shot and 0-shot CoT results are included for reference. The individual tasks are ordered according to the performance of 0-shot CoT.