LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models
Yiwei Liu, Yucheng Li, Xiao Li, Gong Cheng
TL;DR
LogiNumSynth tackles the difficulty of evaluating and training language models on tasks that require integrated logical and numerical reasoning. It introduces a controllable synthesizer that builds formal worlds of Facts, Rules, and Queries, and translates them into diverse natural-language problems with explicit intermediate reasoning steps. The paper demonstrates extensive evaluation across 29 large language models, showing persistent weaknesses in joint reasoning, but also shows that synthetic data can meaningfully improve performance on external benchmarks through targeted training. By enabling fine-grained control over world richness, reasoning depth, and arithmetic complexity, LogiNumSynth offers a scalable, extensible diagnostic and training resource for advancing integrated reasoning in LLMs.
Abstract
Joint logical-numerical reasoning remains a major challenge for language models, yet existing datasets rely on fixed rule sets and offer limited control over task complexity, constraining their generalizability for evaluation and training. We present LogiNumSynth, a flexible natural language problem synthesizer that synthesizes tasks requiring proficiency in joint logical reasoning (e.g., rule-based reasoning) and numerical reasoning (e.g., arithmetic computation). LogiNumSynth supports fine-grained control over reasoning world richness, logical reasoning depth, and the complexity of numerical computations, enabling flexible data synthesis across difficulty levels. We demonstrate three key contributions: (1) Synthesizer -- synthesizing fully controllable joint reasoning tasks over natural language; (2) Evaluation & Process Analysis -- evaluating both process accuracy and answer accuracy; (3) Targeted Training -- using synthesized data to enhance LLMs' reasoning performance. Experiments with multiple LLMs highlight persistent weaknesses in logical-numerical reasoning, showing that LogiNumSynth can serve as both a diagnostic tool and a source of targeted supervision for advancing integrated reasoning skills.
