SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Kechen Li; Wenqi Zhu; Coralia Cartis; Tianbo Ji; Shiwei Liu

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Kechen Li, Wenqi Zhu, Coralia Cartis, Tianbo Ji, Shiwei Liu

TL;DR

This work addresses the challenging problem of deciding nonnegativity for multivariate polynomials, a formal NP-hard task tied to Hilbert’s Seventeenth Problem, by constructing SoS-1K, a ~1000-polynomial dataset with expert reasoning guides. It introduces three prompting regimes (SoS Plain, SoS Simple, SoS Reasoning) and a five-step verification framework to guide LLMs through degree checks, nonnegativity, special cases, square-form representations, and matrix decompositions, assessing multiple SOTA models. The findings show plain prompts underperform, while reasoning-guided prompts dramatically boost accuracy (up to 81%), with reasoning-focused models performing best; a 7B model fine-tuned on SoS-1K (SoS-7B) achieves 70% accuracy—surpassing some much larger baselines while using far less compute. The results demonstrate the potential of AI to assist in solving NP-hard mathematical problems and to accelerate discovery in real algebraic geometry, while also highlighting limitations and the need for careful validation against traditional solvers and longer contexts.

Abstract

Large Language Models (LLMs) have achieved human-level proficiency across diverse tasks, but their ability to perform rigorous mathematical problem solving remains an open challenge. In this work, we investigate a fundamental yet computationally intractable problem: determining whether a given multivariate polynomial is nonnegative. This problem, closely related to Hilbert's Seventeenth Problem, plays a crucial role in global polynomial optimization and has applications in various fields. First, we introduce SoS-1K, a meticulously curated dataset of approximately 1,000 polynomials, along with expert-designed reasoning instructions based on five progressively challenging criteria. Evaluating multiple state-of-the-art LLMs, we find that without structured guidance, all models perform only slightly above the random guess baseline 50%. However, high-quality reasoning instructions significantly improve accuracy, boosting performance up to 81%. Furthermore, our 7B model, SoS-7B, fine-tuned on SoS-1K for just 4 hours, outperforms the 671B DeepSeek-V3 and GPT-4o-mini in accuracy while only requiring 1.8% and 5% of the computation time needed for letters, respectively. Our findings highlight the potential of LLMs to push the boundaries of mathematical reasoning and tackle NP-hard problems.

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

TL;DR

Abstract

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)