Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture

Zishen Wan; Che-Kai Liu; Hanchen Yang; Ritik Raj; Chaojian Li; Haoran You; Yonggan Fu; Cheng Wan; Sixu Li; Youbin Kim; Ananda Samajdar; Yingyan Celine Lin; Mohamed Ibrahim; Jan M. Rabaey; Tushar Krishna; Arijit Raychowdhury

Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture

Zishen Wan, Che-Kai Liu, Hanchen Yang, Ritik Raj, Chaojian Li, Haoran You, Yonggan Fu, Cheng Wan, Sixu Li, Youbin Kim, Ananda Samajdar, Yingyan Celine Lin, Mohamed Ibrahim, Jan M. Rabaey, Tushar Krishna, Arijit Raychowdhury

TL;DR

This paper systematically categorizes neuro-symbolic AI algorithms, and experimentally evaluates them in terms of runtime, memory, computational operators, sparsity, and system characteristics on CPUs, GPUs, and edge SoCs, revealing that neuro-symbolic models suffer from inefficiencies on off-the-shelf hardware.

Abstract

The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, are facing challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability. To develop next-generation cognitive AI systems, neuro-symbolic AI emerges as a promising paradigm, fusing neural and symbolic approaches to enhance interpretability, robustness, and trustworthiness, while facilitating learning from much less data. Recent neuro-symbolic systems have demonstrated great potential in collaborative human-AI scenarios with reasoning and cognitive capabilities. In this paper, we aim to understand the workload characteristics and potential architectures for neuro-symbolic AI. We first systematically categorize neuro-symbolic AI algorithms, and then experimentally evaluate and analyze them in terms of runtime, memory, computational operators, sparsity, and system characteristics on CPUs, GPUs, and edge SoCs. Our studies reveal that neuro-symbolic models suffer from inefficiencies on off-the-shelf hardware, due to the memory-bound nature of vector-symbolic and logical operations, complex flow control, data dependencies, sparsity variations, and limited scalability. Based on profiling insights, we suggest cross-layer optimization solutions and present a hardware acceleration case study for vector-symbolic architecture to improve the performance, efficiency, and scalability of neuro-symbolic computing. Finally, we discuss the challenges and potential future directions of neuro-symbolic AI from both system and architectural perspectives.

Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture

TL;DR

Abstract

Paper Structure (30 sections, 3 equations, 13 figures, 7 tables)

This paper contains 30 sections, 3 equations, 13 figures, 7 tables.

Introduction
Neuro-Symbolic AI Algorithms
Representative Neuro-Symbolic Models
Model Overview.
Logical Neural Network (LNN)
Logical Tensor Network (LTN)
Neuro-Vector-Symbolic Architecture (NVSA)
Neural Logic Machine (NLM)
Vector Symbolic Architecture-Based Image-to-Image Translation (VSAIT)
Zero-Shot Concept Recognition and Acquisition (ZeroC)
Probabilistic Abduction and Execution (PrAE) Learner
Workload Characterization Methodology
Workload Profiling Methodology
Workload Characterization Taxonomy
Workload Characterization Results
...and 15 more sections

Figures (13)

Figure 1: Overview of neuro-symbolic AI systems, workload characterizations, optimization solutions, challenges, and research opportunities in improving the performance of next-generation cognitive AI.
Figure 1: Review of recent neuro-symbolic AI algorithms into five categories, with their underlying operations and vector formats.
Figure 2: Neural and symbolic runtime latency characterization.(a) Benchmark seven representative neuro-symbolic workloads (LNN, LTN, NVSA, NLM, VSAIT, ZeroC, PrAE) on the CPU+GPU system, showing symbolic may serve as system bottleneck. (b) Benchmark NVSA and NLM workloads on Jetson TX2, Xavier NX, and RTX GPU, showing that real-time performance cannot be satisfied. (c) Benchmark NVSA workload on various RPM task sizes on RTX GPU, indicating the potential scalability problem and consistent symbolic bottleneck.
Figure 3: Selected neuro-symbolic AI workloads for analysis, representing a diverse of categories, applications, and computational patterns.
Figure 3: Compute operators, memory and roofline characterization.(a) Compute operator runtime ratio of representative neuro-symbolic workloads, indicating neural operations mainly consisting of MatMul and Conv, while symbolic operations with vector/tensors. (b) Benchmark memory usage during computation and (c) roofline analysis on RTX 2080Ti GPU, showing typically neural operations are compute-bounded and symbolic operations are memory-bounded.
...and 8 more figures

Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture

TL;DR

Abstract

Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (13)