Table of Contents
Fetching ...

Conceptual and Unbiased Reasoning in Language Models

Ben Zhou, Hongming Zhang, Sihao Chen, Dian Yu, Hongwei Wang, Baolin Peng, Dan Roth, Dong Yu

TL;DR

This work introduces a conceptualization framework that dissects language model reasoning into abstract, verifiable symbolic processes by abstracting questions and executing symbolic programs. It demonstrates that standard LLMs suffer notable declines in conceptual reasoning compared with inductive approaches and proposes two strategies, similar-question induction signals and self-refinement, to mitigate these gaps. Across multiple benchmarks, these techniques yield 8–11% improvements and reduce reliance on memorized cues, suggesting a path toward unbiased, generalizable reasoning. The framework provides analytical tools to probe reasoning, enabling more controllable and robust AI systems with less inductive bias.

Abstract

Conceptual reasoning, the ability to reason in abstract and high-level perspectives, is key to generalization in human cognition. However, limited study has been done on large language models' capability to perform conceptual reasoning. In this work, we bridge this gap and propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions and generate solutions in a verifiable symbolic space. Using this framework as an analytical tool, we show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks compared to direct inference methods. We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making. We propose two techniques to add trustworthy induction signals by generating familiar questions with similar underlying reasoning paths and asking models to perform self-refinement. Experiments show that our proposed techniques improve models' conceptual reasoning performance by 8% to 11%, achieving a more robust reasoning system that relies less on inductive biases.

Conceptual and Unbiased Reasoning in Language Models

TL;DR

This work introduces a conceptualization framework that dissects language model reasoning into abstract, verifiable symbolic processes by abstracting questions and executing symbolic programs. It demonstrates that standard LLMs suffer notable declines in conceptual reasoning compared with inductive approaches and proposes two strategies, similar-question induction signals and self-refinement, to mitigate these gaps. Across multiple benchmarks, these techniques yield 8–11% improvements and reduce reliance on memorized cues, suggesting a path toward unbiased, generalizable reasoning. The framework provides analytical tools to probe reasoning, enabling more controllable and robust AI systems with less inductive bias.

Abstract

Conceptual reasoning, the ability to reason in abstract and high-level perspectives, is key to generalization in human cognition. However, limited study has been done on large language models' capability to perform conceptual reasoning. In this work, we bridge this gap and propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions and generate solutions in a verifiable symbolic space. Using this framework as an analytical tool, we show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks compared to direct inference methods. We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making. We propose two techniques to add trustworthy induction signals by generating familiar questions with similar underlying reasoning paths and asking models to perform self-refinement. Experiments show that our proposed techniques improve models' conceptual reasoning performance by 8% to 11%, achieving a more robust reasoning system that relies less on inductive biases.
Paper Structure (25 sections, 3 figures, 14 tables)

This paper contains 25 sections, 3 figures, 14 tables.

Figures (3)

  • Figure 1: An overview of the conceptualization process. From an original question $Q$, GPT-4 generates an abstract question $Q_{abs}$ and corresponding parameters through a conceptualization process, and a downstream LLM will try to generate a symbolic solution based on only $Q_{abs}$, which is later executed with the actual parameters.
  • Figure 2: Comparison of a direct inference method (left) and conceptual inference (right). Results are taken from gpt-3-turbo.
  • Figure 3: An overview of the self-refinement process with a concrete example. The initial program does not consider that some animals can change their color for camouflage purposes, as pointed out in the generate similar question's CoT solution. Based on this information, the model can refine the initial program to include such consideration.