Table of Contents
Fetching ...

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Zhuo Jiang, Han Ding, Yongyi Hu, Chi Zhang, Yiqi Shi, Shitong Weng, Aili Chen, Shiqi Chen, Yunan Huang, Mozhi Zhang, Pengyu Zhao, Junjie Yan, Junxian He

TL;DR

SynLogic introduces a scalable framework to synthesize verifiable logical reasoning data across 35 tasks and pairs it with a reinforcement-learning regimen using verifiable rewards. The dataset comes in two difficulty variants (Hard and Easy) and enables zero-shot evaluation across logic and math benchmarks, achieving state-of-the-art open-source logical reasoning performance on KOR-Bench and BBEH. The authors further show that mixing SynLogic with math and coding data improves training efficiency and generalization, and that large-scale Zero-RL training with diverse data yields strong cross-domain performance. Open-sourcing both the data synthesis pipeline and the dataset, SynLogic offers a practical resource to advance general reasoning capabilities in LLMs.

Abstract

Recent advances such as OpenAI-o1 and DeepSeek R1 have demonstrated the potential of Reinforcement Learning (RL) to enhance reasoning abilities in Large Language Models (LLMs). While open-source replication efforts have primarily focused on mathematical and coding domains, methods and resources for developing general reasoning capabilities remain underexplored. This gap is partly due to the challenge of collecting diverse and verifiable reasoning data suitable for RL. We hypothesize that logical reasoning is critical for developing general reasoning capabilities, as logic forms a fundamental building block of reasoning. In this work, we present SynLogic, a data synthesis framework and dataset that generates diverse logical reasoning data at scale, encompassing 35 diverse logical reasoning tasks. The SynLogic approach enables controlled synthesis of data with adjustable difficulty and quantity. Importantly, all examples can be verified by simple rules, making them ideally suited for RL with verifiable rewards. In our experiments, we validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models. SynLogic leads to state-of-the-art logical reasoning performance among open-source datasets, surpassing DeepSeek-R1-Distill-Qwen-32B by 6 points on BBEH. Furthermore, mixing SynLogic data with mathematical and coding tasks improves the training efficiency of these domains and significantly enhances reasoning generalization. Notably, our mixed training model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks. These findings position SynLogic as a valuable resource for advancing the broader reasoning capabilities of LLMs. We open-source both the data synthesis pipeline and the SynLogic dataset at https://github.com/MiniMax-AI/SynLogic.

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

TL;DR

SynLogic introduces a scalable framework to synthesize verifiable logical reasoning data across 35 tasks and pairs it with a reinforcement-learning regimen using verifiable rewards. The dataset comes in two difficulty variants (Hard and Easy) and enables zero-shot evaluation across logic and math benchmarks, achieving state-of-the-art open-source logical reasoning performance on KOR-Bench and BBEH. The authors further show that mixing SynLogic with math and coding data improves training efficiency and generalization, and that large-scale Zero-RL training with diverse data yields strong cross-domain performance. Open-sourcing both the data synthesis pipeline and the dataset, SynLogic offers a practical resource to advance general reasoning capabilities in LLMs.

Abstract

Recent advances such as OpenAI-o1 and DeepSeek R1 have demonstrated the potential of Reinforcement Learning (RL) to enhance reasoning abilities in Large Language Models (LLMs). While open-source replication efforts have primarily focused on mathematical and coding domains, methods and resources for developing general reasoning capabilities remain underexplored. This gap is partly due to the challenge of collecting diverse and verifiable reasoning data suitable for RL. We hypothesize that logical reasoning is critical for developing general reasoning capabilities, as logic forms a fundamental building block of reasoning. In this work, we present SynLogic, a data synthesis framework and dataset that generates diverse logical reasoning data at scale, encompassing 35 diverse logical reasoning tasks. The SynLogic approach enables controlled synthesis of data with adjustable difficulty and quantity. Importantly, all examples can be verified by simple rules, making them ideally suited for RL with verifiable rewards. In our experiments, we validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models. SynLogic leads to state-of-the-art logical reasoning performance among open-source datasets, surpassing DeepSeek-R1-Distill-Qwen-32B by 6 points on BBEH. Furthermore, mixing SynLogic data with mathematical and coding tasks improves the training efficiency of these domains and significantly enhances reasoning generalization. Notably, our mixed training model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks. These findings position SynLogic as a valuable resource for advancing the broader reasoning capabilities of LLMs. We open-source both the data synthesis pipeline and the SynLogic dataset at https://github.com/MiniMax-AI/SynLogic.

Paper Structure

This paper contains 36 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The framework of logic data synthesis. The process begins with the selection of suitable tasks and the identification of key parameters that control task difficulty. Next, logic instances are generated with appropriate difficulty control (e.g., setting the grid size of Sudoku to 7). These instances are subsequently formalized into natural language instructions. Each task is paired with a task-specific verifier to check the correctness of responses. This framework enables the systematic synthesis of high-quality logic data, covering a wide range of difficulty levels and 35 task types.
  • Figure 2: Evaluation of task difficulty across our dataset versions. (a) Shows the performance of 7B-scale models on the SynLogic-Easy dataset, while (b) demonstrates the performance of 32B-scale models on the more challenging SynLogic-Hard dataset. Results are measured using avg@8 (average pass rate with eight attempts) and pass@8 (success within eight attempts) metrics, illustrating the appropriate difficulty control for each model scale.
  • Figure 3: The prompt template used for training models on SynLogic data.
  • Figure 4: Response length and reflection ratio across the 7B and 32B training process on the training dataset. The reflection ratio represents the proportion of generated responses containing at least one reflection phrase (including "recheck", "rethink", "try again", "let's correct it", "re-evaluate", "check again", "think again").
  • Figure 5: Performance comparison of 7B models trained on mixed data (Logic+Math) versus math-only (Math) data. (a) Accuracy on KOR-Bench. (b) Average accuracy across three mathematics benchmarks (MATH 500, AIME 2024, AMC 2023) as a function of training steps. (c) Average accuracy on mathematics benchmarks as a function of consumed mathematical data volume.
  • ...and 6 more figures