Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

Kedi Chen; Zhikai Lei; Fan Zhang; Yinqi Zhang; Qin Chen; Jie Zhou; Liang He; Qipeng Guo; Kai Chen; Wei Zhang

Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

Kedi Chen, Zhikai Lei, Fan Zhang, Yinqi Zhang, Qin Chen, Jie Zhou, Liang He, Qipeng Guo, Kai Chen, Wei Zhang

TL;DR

This work tackles the underexplored area of inductive reasoning in large language models by addressing the data bottleneck with CodeSeq, a synthetic dataset built from number sequences. It treats finding the general term $a_n$ as a code problem and injects case-based supervision through code unit tests within a three-stage synthetic data pipeline. Finetuning LLMs with CodeSeq yields improvements on code-generation benchmarks and strong transfer to broad reasoning tasks, indicating that inductive reasoning data can meaningfully enhance reasoning abilities. The approach demonstrates the practical potential of harnessing sequence-based inductive tasks to boost generalization and reasoning in LLMs.

Abstract

Large language models make remarkable progress in reasoning capabilities. Existing works focus mainly on deductive reasoning tasks (e.g., code and math), while another type of reasoning mode that better aligns with human learning, inductive reasoning, is not well studied. We attribute the reason to the fact that obtaining high-quality process supervision data is challenging for inductive reasoning. Towards this end, we novelly employ number sequences as the source of inductive reasoning data. We package sequences into algorithmic problems to find the general term of each sequence through a code solution. In this way, we can verify whether the code solution holds for any term in the current sequence, and inject case-based supervision signals by using code unit tests. We build a sequence synthetic data pipeline and form a training dataset CodeSeq. Experimental results show that the models tuned with CodeSeq improve on both code and comprehensive reasoning benchmarks.

Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

TL;DR

Abstract

Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)