Table of Contents
Fetching ...

CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design

Zishen Wan, Hanchen Yang, Ritik Raj, Che-Kai Liu, Ananda Samajdar, Arijit Raychowdhury, Tushar Krishna

TL;DR

CogSys tackles the inefficiency of neurosymbolic AI on conventional hardware by delivering an algorithm-hardware co-design that unifies neural perception with VSA-based symbolic reasoning. It introduces a symbolic codebook factorization, reconfigurable nsPEs, bubble streaming dataflow, spatial-temporal mapping, and an adaptive workload-aware scheduler to achieve real-time performance and scalable acceleration. The framework demonstrates substantial speedups over TPU-like and GPU baselines, compact area and low power, and real-time abduction reasoning, validating its viability for edge and cognitive tasks. This work provides a practical path toward deployable, high-throughput neurosymbolic systems with improved interpretability and reasoning capability at scale.

Abstract

Neurosymbolic AI is an emerging compositional paradigm that fuses neural learning with symbolic reasoning to enhance the transparency, interpretability, and trustworthiness of AI. It also exhibits higher data efficiency making it promising for edge deployments. Despite the algorithmic promises and demonstrations, unfortunately executing neurosymbolic workloads on current hardware (CPU/GPU/TPU) is challenging due to higher memory intensity, greater compute heterogeneity and access pattern irregularity, leading to severe hardware underutilization. This work proposes CogSys, a characterization and co-design framework dedicated to neurosymbolic AI system acceleration, aiming to win both reasoning efficiency and scalability. On the algorithm side, CogSys proposes an efficient factorization technique to alleviate compute and memory overhead. On the hardware side, CogSys proposes a scalable neurosymbolic architecture with reconfigurable neuro/symbolic processing elements (nsPE) and bubble streaming (BS) dataflow with spatial-temporal (ST) mapping for highly parallel and efficient neurosymbolic computation. On the system side, CogSys features an adaptive workload-aware scheduler (adSCH) to orchestrate heterogeneous kernels and enhance resource utilization. Evaluated across cognitive workloads, CogSys enables reconfigurable support for neural and symbolic kernels and exhibits >75x speedup over TPU-like systolic array with only <5% area overhead, as benchmarked under the TSMC 28nm technology node. CogSys achieves 4x-96x speedup compared to desktop and edge GPUs. For the first time, CogSys enables real-time abduction reasoning towards human fluid intelligence, requiring only 0.3 s per reasoning task with 4 mm2 area and 1.48 W power consumption.

CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design

TL;DR

CogSys tackles the inefficiency of neurosymbolic AI on conventional hardware by delivering an algorithm-hardware co-design that unifies neural perception with VSA-based symbolic reasoning. It introduces a symbolic codebook factorization, reconfigurable nsPEs, bubble streaming dataflow, spatial-temporal mapping, and an adaptive workload-aware scheduler to achieve real-time performance and scalable acceleration. The framework demonstrates substantial speedups over TPU-like and GPU baselines, compact area and low power, and real-time abduction reasoning, validating its viability for edge and cognitive tasks. This work provides a practical path toward deployable, high-throughput neurosymbolic systems with improved interpretability and reasoning capability at scale.

Abstract

Neurosymbolic AI is an emerging compositional paradigm that fuses neural learning with symbolic reasoning to enhance the transparency, interpretability, and trustworthiness of AI. It also exhibits higher data efficiency making it promising for edge deployments. Despite the algorithmic promises and demonstrations, unfortunately executing neurosymbolic workloads on current hardware (CPU/GPU/TPU) is challenging due to higher memory intensity, greater compute heterogeneity and access pattern irregularity, leading to severe hardware underutilization. This work proposes CogSys, a characterization and co-design framework dedicated to neurosymbolic AI system acceleration, aiming to win both reasoning efficiency and scalability. On the algorithm side, CogSys proposes an efficient factorization technique to alleviate compute and memory overhead. On the hardware side, CogSys proposes a scalable neurosymbolic architecture with reconfigurable neuro/symbolic processing elements (nsPE) and bubble streaming (BS) dataflow with spatial-temporal (ST) mapping for highly parallel and efficient neurosymbolic computation. On the system side, CogSys features an adaptive workload-aware scheduler (adSCH) to orchestrate heterogeneous kernels and enhance resource utilization. Evaluated across cognitive workloads, CogSys enables reconfigurable support for neural and symbolic kernels and exhibits >75x speedup over TPU-like systolic array with only <5% area overhead, as benchmarked under the TSMC 28nm technology node. CogSys achieves 4x-96x speedup compared to desktop and edge GPUs. For the first time, CogSys enables real-time abduction reasoning towards human fluid intelligence, requiring only 0.3 s per reasoning task with 4 mm2 area and 1.48 W power consumption.

Paper Structure

This paper contains 39 sections, 19 figures, 10 tables.

Figures (19)

  • Figure 1: Neurosymbolic AI is an emerging compositional system that integrates neural and symbolic modules, enabling superior cognitive intelligence compared to NNs. However, it suffers from inefficient TPU/GPU execution. CogSys is a reconfigurable neural/symbolic engine excelling in both reasoning efficiency and cognitive capability.
  • Figure 2: Neurosymbolic algorithm flow. Neural systems handle perception by processing raw data and extracting features, which are then utilized by symbolic reasoning systems to apply logical rules and knowledge. This compositionality enables the execution of complex cognitive tasks such as abstract deduction, ethical decision-making, and fluid intelligence.
  • Figure 3: Illustration of VSA functionality. Neural network suffers from binding ambiguity issues, whereas VSA constructs vector representations with circular convolution operations for reasoning process.
  • Figure 4: End-to-end neurosymbolic runtime, memory, and roofline characterization.(a)Benchmark neurosymbolic models on CPU+GPU system, showing symbolic may serve as system bottleneck. (b) Benchmark neurosymbolic models on Coral TPU, TX2, NX, and 2080Ti GPU, showing that real-time performance cannot be satisfied. (c) Benchmark models on various task sizes, indicating the potential scalability problem.(d) Benchmark memory footprint of neurosymbolic models, showing large memory overhead of symbolic knowledge codebook.
  • Figure 5: Roofline analysis. End-to-end neurosymbolic roofline characterization on RTX 2080Ti GPU, indicating that typically neuro is compute-bounded and symbolic is memory-bounded.
  • ...and 14 more figures