Table of Contents
Fetching ...

A Neuro-Symbolic Framework for Reasoning under Perceptual Uncertainty: Bridging Continuous Perception and Discrete Symbolic Planning

Jiahao Wu, Shengwen Yu

TL;DR

This work tackles the challenge of reasoning under perceptual uncertainty by proposing a probabilistic neuro-symbolic framework that ties continuous perception to discrete symbolic planning. It combines a Transformer-GNN translator to produce probabilistic predicates from visual input with an uncertainty-aware symbolic planner that can trigger information-gathering actions, all within a closed-loop execution. The paper contributes a dependency-aware uncertainty model (MRF-based), calibration-guided planning convergence guarantees, and an analytical optimum for planning thresholds, plus extensive empirical validation on 10,047 synthetic tabletop scenes showing symbol-prediction F1≈$0.68$ and average task success ≈$90.7 ext{%}$ with planning times in the $10$–$15$ ms range, outperforming strong POMDP baselines by $10$–$14$ percentage points. It also demonstrates principled links between perception calibration and planning performance, providing actionable design guidelines and releasing datasets and code to promote reproducibility and extension to broader domains.

Abstract

Bridging continuous perceptual signals and discrete symbolic reasoning is a fundamental challenge in AI systems that must operate under uncertainty. We present a neuro-symbolic framework that explicitly models and propagates uncertainty from perception to planning, providing a principled connection between these two abstraction levels. Our approach couples a transformer-based perceptual front-end with graph neural network (GNN) relational reasoning to extract probabilistic symbolic states from visual observations, and an uncertainty-aware symbolic planner that actively gathers information when confidence is low. We demonstrate the framework's effectiveness on tabletop robotic manipulation as a concrete application: the translator processes 10,047 PyBullet-generated scenes (3--10 objects) and outputs probabilistic predicates with calibrated confidences (overall F1=0.68). When embedded in the planner, the system achieves 94\%/90\%/88\% success on Simple Stack, Deep Stack, and Clear+Stack benchmarks (90.7\% average), exceeding the strongest POMDP baseline by 10--14 points while planning within 15\,ms. A probabilistic graphical-model analysis establishes a quantitative link between calibrated uncertainty and planning convergence, providing theoretical guarantees that are validated empirically. The framework is general-purpose and can be applied to any domain requiring uncertainty-aware reasoning from perceptual input to symbolic planning.

A Neuro-Symbolic Framework for Reasoning under Perceptual Uncertainty: Bridging Continuous Perception and Discrete Symbolic Planning

TL;DR

This work tackles the challenge of reasoning under perceptual uncertainty by proposing a probabilistic neuro-symbolic framework that ties continuous perception to discrete symbolic planning. It combines a Transformer-GNN translator to produce probabilistic predicates from visual input with an uncertainty-aware symbolic planner that can trigger information-gathering actions, all within a closed-loop execution. The paper contributes a dependency-aware uncertainty model (MRF-based), calibration-guided planning convergence guarantees, and an analytical optimum for planning thresholds, plus extensive empirical validation on 10,047 synthetic tabletop scenes showing symbol-prediction F1≈ and average task success ≈ with planning times in the ms range, outperforming strong POMDP baselines by percentage points. It also demonstrates principled links between perception calibration and planning performance, providing actionable design guidelines and releasing datasets and code to promote reproducibility and extension to broader domains.

Abstract

Bridging continuous perceptual signals and discrete symbolic reasoning is a fundamental challenge in AI systems that must operate under uncertainty. We present a neuro-symbolic framework that explicitly models and propagates uncertainty from perception to planning, providing a principled connection between these two abstraction levels. Our approach couples a transformer-based perceptual front-end with graph neural network (GNN) relational reasoning to extract probabilistic symbolic states from visual observations, and an uncertainty-aware symbolic planner that actively gathers information when confidence is low. We demonstrate the framework's effectiveness on tabletop robotic manipulation as a concrete application: the translator processes 10,047 PyBullet-generated scenes (3--10 objects) and outputs probabilistic predicates with calibrated confidences (overall F1=0.68). When embedded in the planner, the system achieves 94\%/90\%/88\% success on Simple Stack, Deep Stack, and Clear+Stack benchmarks (90.7\% average), exceeding the strongest POMDP baseline by 10--14 points while planning within 15\,ms. A probabilistic graphical-model analysis establishes a quantitative link between calibrated uncertainty and planning convergence, providing theoretical guarantees that are validated empirically. The framework is general-purpose and can be applied to any domain requiring uncertainty-aware reasoning from perceptual input to symbolic planning.

Paper Structure

This paper contains 86 sections, 10 theorems, 48 equations, 10 figures, 12 tables, 1 algorithm.

Key Result

Proposition 1

The time complexity of the translator is $O(H \cdot W \cdot C)$, where $H \times W \times C$ are image dimensions. The space complexity is $O(H \cdot W \cdot C + |\Phi|)$, where $|\Phi|$ is the number of possible predicates.

Figures (10)

  • Figure 1: Complete neuro-symbolic task planning pipeline visualization. The figure demonstrates the end-to-end process from raw visual perception to action plan generation: (a) RGB Image captures the visual appearance of objects in the scene; (b) Depth Image provides spatial distance information, with darker colors indicating closer objects; (c) Segmentation Mask assigns distinct color labels to each object and surface for instance segmentation; (d) Object Detection shows bounding boxes with labels (e.g., obj_0, obj_1) identified by the neural-symbolic translator, where each object is detected with high precision and labeled with connecting lines to avoid occlusion. The neural-symbolic translator processes these multi-modal observations to extract probabilistic symbolic states with confidence scores, which are then used by the uncertainty-aware planner to generate action sequences. This comprehensive visualization demonstrates how our system transforms multi-modal sensory input into interpretable symbolic representations and actionable plans, explicitly handling perceptual uncertainty throughout the pipeline.
  • Figure 2: Example samples from our training dataset of 10,047 synthetic scenes generated using PyBullet with YCB objects. Each sample consists of an RGB image (224$\times$224 pixels), corresponding ground-truth symbolic state, and object positions. The dataset covers diverse object configurations, spatial relationships (On, LeftOf, CloseTo, Touching, Clear), and scene complexities (3-10 objects). The labels (e.g., "15/65 true (23.1%)") indicate the number of true predicates out of all possible relation combinations. Our neural-symbolic translator achieves high prediction accuracy (overall F1=0.68) across this diverse dataset, demonstrating robust generalization to various scene configurations and relation types.
  • Figure 3: Conceptual diagram illustrating how our neuro-symbolic framework recognizes objects and extracts symbolic information from visual observations. The neural-symbolic translator processes the input image to detect objects (with confidence scores) and extract spatial relations (as probabilistic predicates). The symbolic planner then uses these probabilistic symbols to generate robust action plans. This diagram demonstrates the key concepts of symbolic representation and how the planner identifies object information from samples.
  • Figure 4: The two simulated tabletop manipulation environments used for our experiments, both running in PyBullet. (a) The Franka Emika Panda robot environment. (b) The UR5 robot environment. Both scenes are populated with a random selection of YCB objects, demonstrating the cluttered and complex scenarios our framework is designed to handle.
  • Figure 5: Success rate comparison on the YCB-Video Complex Stack scenario. Our method maintains a clear lead over neural and classical baselines while approaching the perfect-perception upper bound.
  • ...and 5 more figures

Theorems & Definitions (14)

  • Proposition 1: Translator Complexity
  • Proposition 2: Uncertainty Preservation
  • Theorem 1: Uncertainty Propagation (Independence Case)
  • proof
  • Theorem 2: Information Gathering Value
  • Corollary 1
  • Theorem 3: Convergence Guarantee with Calibrated Uncertainty
  • proof
  • Corollary 2
  • Theorem 4: Optimality Guarantee
  • ...and 4 more