Table of Contents
Fetching ...

Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems

Dengyun Peng, Qiguang Chen, Bofei Liu, Jiannan Guan, Libo Qin, Zheng Yan, Jinhao Liu, Jianshu Zhang, Wanxiang Che

TL;DR

<3-5 sentence high-level summary> The paper addresses the problem of distinguishing inherent unsolvability from model capability limits in reasoning tasks. It introduces UnsolvableQA, a dataset constructed via a novel Reverse Construction method and logic-puzzle generators, and UnsolvableRL, a reinforcement learning framework with a dynamic, three-component reward system to train models to solve solvable tasks, detect unsolvability, and calibrate refusals. Empirical results show near-perfect unsolvability detection and substantial gains in solvable-task accuracy, along with the identification of Capability Collapse when unsolvability data is not included. The work demonstrates the necessity of explicit unsolvability data to prevent overconfidence and provides a practical approach for building more reliable AI systems that know when not to answer.

Abstract

Ensuring LLM reliability requires not only solving complex problems but also recognizing when a problem is unsolvable. Current models often struggle to distinguish objective unsolvability (inherent contradictions in the problem) from subjective capability limitations (problems beyond the model's competence), which leads to hallucinations and overconfidence. To address this, we propose UnsolvableQA and UnsolvableRL to solve feasible problems, detect inherent contradictions, and prudently refuse tasks beyond capability. Specifically, we construct UnsolvableQA, a dataset of paired solvable and unsolvable instances derived via a dual-track methodology: programmatic generation for logic puzzles and a novel "Reverse Construction" method that injects contradictions into valid reasoning chains for mathematics. Building on this dataset, we introduce UnsolvableRL, a reinforcement learning framework with three reward components jointly accounting for accuracy, unsolvability, and difficulty. Empirical results show that our approach achieves near-perfect unsolvability detection while also improving accuracy on solvable tasks. Crucially, we identify Capability Collapse, demonstrating that explicit exposure to unsolvable data is indispensable for preventing models from becoming systematically overconfident. Our code and data are available at https://github.com/sfasfaffa/unsolvableQA.

Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems

TL;DR

<3-5 sentence high-level summary> The paper addresses the problem of distinguishing inherent unsolvability from model capability limits in reasoning tasks. It introduces UnsolvableQA, a dataset constructed via a novel Reverse Construction method and logic-puzzle generators, and UnsolvableRL, a reinforcement learning framework with a dynamic, three-component reward system to train models to solve solvable tasks, detect unsolvability, and calibrate refusals. Empirical results show near-perfect unsolvability detection and substantial gains in solvable-task accuracy, along with the identification of Capability Collapse when unsolvability data is not included. The work demonstrates the necessity of explicit unsolvability data to prevent overconfidence and provides a practical approach for building more reliable AI systems that know when not to answer.

Abstract

Ensuring LLM reliability requires not only solving complex problems but also recognizing when a problem is unsolvable. Current models often struggle to distinguish objective unsolvability (inherent contradictions in the problem) from subjective capability limitations (problems beyond the model's competence), which leads to hallucinations and overconfidence. To address this, we propose UnsolvableQA and UnsolvableRL to solve feasible problems, detect inherent contradictions, and prudently refuse tasks beyond capability. Specifically, we construct UnsolvableQA, a dataset of paired solvable and unsolvable instances derived via a dual-track methodology: programmatic generation for logic puzzles and a novel "Reverse Construction" method that injects contradictions into valid reasoning chains for mathematics. Building on this dataset, we introduce UnsolvableRL, a reinforcement learning framework with three reward components jointly accounting for accuracy, unsolvability, and difficulty. Empirical results show that our approach achieves near-perfect unsolvability detection while also improving accuracy on solvable tasks. Crucially, we identify Capability Collapse, demonstrating that explicit exposure to unsolvable data is indispensable for preventing models from becoming systematically overconfident. Our code and data are available at https://github.com/sfasfaffa/unsolvableQA.

Paper Structure

This paper contains 35 sections, 18 equations, 4 figures, 4 tables, 4 algorithms.

Figures (4)

  • Figure 1: Comparison of alignment paradigms. We treat the solution space (solvable vs. unsolvable) and question difficulty (easy vs. hard) as orthogonal dimensions, requiring models to objectively detect contradictions and subjectively calibrate confidence.
  • Figure 2: The UnsolvableQA data collection pipeline. (a) Mathematical Data Collection: We employ a "Reverse Construction" method where contradictions are injected into the solution path of solvable problems. The resulting instances are verified by an LLM to ensure they lead to a contradiction (unsolvable) rather than a valid solution. (b) Puzzle Data Collection: We utilize domain-specific programmatic generators and automated deterministic verifiers (e.g., SAT solvers, DFS) to rigorously classify randomly generated instances into solvable and unsolvable sets.
  • Figure 3: UnsolvableRL Reward Framework. We decompose the alignment into two orthogonal components: (a) Objective Detection, which incentivizes the model to distinguish between solvable and unsolvable queries. Crucially, we impose a penalty (-0.5) for falsely labeling solvable problems as unsolvable to safeguard the model's reasoning drive. (b) Subjective Calibration, which dynamically regulates refusal behavior. By comparing the target threshold ($\tau$) with the model's rollout accuracy ($\beta$), the mechanism encourages refusal on hard tasks (where $\tau > \beta$) and suppresses it on easy ones (where $\tau < \beta$) via the dynamic reward term $\lambda(\tau_i - \beta_i)$.
  • Figure 4: Evolution of Unsolvable Accuracy during training. The curves represent the average detection accuracy across unsolvable instances from the evaluated domains. Models trained with UnsolvableQA (solid lines) steadily improve their ability to identify contradictions. In contrast, the "No Data" ablation (dashed lines) leads to stagnation or, specifically for the 1.7B model, a catastrophic collapse in unsolvability detection.