Table of Contents
Fetching ...

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

Venkatakrishna Reddy Oruganti

Abstract

Neural networks excel at pattern recognition but struggle with constraint reasoning -- determining whether configurations satisfy logical or physical constraints. We introduce Differentiable Symbolic Planning (DSP), a neural architecture that performs discrete symbolic reasoning while remaining fully differentiable. DSP maintains a feasibility channel (phi) that tracks constraint satisfaction evidence at each node, aggregates this into a global feasibility signal (Phi) through learned rule-weighted combination, and uses sparsemax attention to achieve exact-zero discrete rule selection. We integrate DSP into a Universal Cognitive Kernel (UCK) that combines graph attention with iterative constraint propagation. Evaluated on three constraint reasoning benchmarks -- graph reachability, Boolean satisfiability, and planning feasibility -- UCK+DSP achieves 97.4% accuracy on planning under 4x size generalization (vs. 59.7% for ablated baselines), 96.4% on SAT under 2x generalization, and maintains balanced performance on both positive and negative classes where standard neural approaches collapse. Ablation studies reveal that global phi aggregation is critical: removing it causes accuracy to drop from 98% to 64%. The learned phi signal exhibits interpretable semantics, with values of +18 for feasible cases and -13 for infeasible cases emerging without supervision.

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

Abstract

Neural networks excel at pattern recognition but struggle with constraint reasoning -- determining whether configurations satisfy logical or physical constraints. We introduce Differentiable Symbolic Planning (DSP), a neural architecture that performs discrete symbolic reasoning while remaining fully differentiable. DSP maintains a feasibility channel (phi) that tracks constraint satisfaction evidence at each node, aggregates this into a global feasibility signal (Phi) through learned rule-weighted combination, and uses sparsemax attention to achieve exact-zero discrete rule selection. We integrate DSP into a Universal Cognitive Kernel (UCK) that combines graph attention with iterative constraint propagation. Evaluated on three constraint reasoning benchmarks -- graph reachability, Boolean satisfiability, and planning feasibility -- UCK+DSP achieves 97.4% accuracy on planning under 4x size generalization (vs. 59.7% for ablated baselines), 96.4% on SAT under 2x generalization, and maintains balanced performance on both positive and negative classes where standard neural approaches collapse. Ablation studies reveal that global phi aggregation is critical: removing it causes accuracy to drop from 98% to 64%. The learned phi signal exhibits interpretable semantics, with values of +18 for feasible cases and -13 for infeasible cases emerging without supervision.

Paper Structure

This paper contains 47 sections, 8 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Universal Cognitive Kernel (UCK) with DSP Module. The system processes graph-structured inputs through $T$ rollout steps, alternating between graph attention and DSP updates. The DSP module maintains learnable rule embeddings, computes sparse rule activation ($\alpha$) and node selection ($\beta$) using sparsemax, and aggregates local feasibility into a global signal ($\Phi$).
  • Figure 2: DSP Module detailed architecture. The module computes global summaries, applies sparsemax for discrete rule activation ($\alpha$) and node selection ($\beta$), computes gated effects, and aggregates local feasibility into the critical global $\Phi$ signal. Ablation evidence shows removing global $\phi$ causes 34-point accuracy collapse.
  • Figure 3: Comparison of softmax vs sparsemax attention. Softmax distributes weight across all rules, causing interference. Sparsemax produces exact zeros, enabling discrete rule selection. This difference accounts for 26 percentage points in accuracy (71.1% vs 97.4%).
  • Figure 4: Learned $\phi$ semantics. The global feasibility signal learns to separate feasible ($\mu = +18$) from infeasible ($\mu = -13$) cases with 31.5-point separation, emerging purely from training without explicit supervision of $\phi$ values.