Table of Contents
Fetching ...

Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

Rongjie Jiang, Jianwei Wang, Gengda Zhao, Chengyang Luo, Kai Wang, Wenjie Zhang

Abstract

Recent advances in large language models have driven the emergence of intelligent agents operating in open-world, multimodal environments. To support long-term reasoning, such agents are typically equipped with external memory systems. However, most existing multimodal agent memories rely primarily on neural representations and vector-based retrieval, which are well-suited for inductive, intuitive reasoning but fundamentally limited in supporting analytical, deductive reasoning critical for real-world decision making. To address this limitation, we propose NS-Mem, a long-term neuro-symbolic memory framework designed to advance multimodal agent reasoning by integrating neural memory with explicit symbolic structures and rules. Specifically, NS-Mem is operated around three core components of a memory system: (1) a three-layer memory architecture that consists episodic layer, semantic layer and logic rule layer, (2) a memory construction and maintenance mechanism implemented by SK-Gen that automatically consolidates structured knowledge from accumulated multimodal experiences and incrementally updates both neural representations and symbolic rules, and (3) a hybrid memory retrieval mechanism that combines similarity-based search with deterministic symbolic query functions to support structured reasoning. Experiments on real-world multimodal reasoning benchmarks demonstrate that Neural-Symbolic Memory achieves an average 4.35% improvement in overall reasoning accuracy over pure neural memory systems, with gains of up to 12.5% on constrained reasoning queries, validating the effectiveness of NS-Mem.

Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

Abstract

Recent advances in large language models have driven the emergence of intelligent agents operating in open-world, multimodal environments. To support long-term reasoning, such agents are typically equipped with external memory systems. However, most existing multimodal agent memories rely primarily on neural representations and vector-based retrieval, which are well-suited for inductive, intuitive reasoning but fundamentally limited in supporting analytical, deductive reasoning critical for real-world decision making. To address this limitation, we propose NS-Mem, a long-term neuro-symbolic memory framework designed to advance multimodal agent reasoning by integrating neural memory with explicit symbolic structures and rules. Specifically, NS-Mem is operated around three core components of a memory system: (1) a three-layer memory architecture that consists episodic layer, semantic layer and logic rule layer, (2) a memory construction and maintenance mechanism implemented by SK-Gen that automatically consolidates structured knowledge from accumulated multimodal experiences and incrementally updates both neural representations and symbolic rules, and (3) a hybrid memory retrieval mechanism that combines similarity-based search with deterministic symbolic query functions to support structured reasoning. Experiments on real-world multimodal reasoning benchmarks demonstrate that Neural-Symbolic Memory achieves an average 4.35% improvement in overall reasoning accuracy over pure neural memory systems, with gains of up to 12.5% on constrained reasoning queries, validating the effectiveness of NS-Mem.
Paper Structure (27 sections, 3 theorems, 5 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 3 theorems, 5 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

As the number of observations $n \to \infty$, the estimated transition probabilities converge almost surely to the true underlying probabilities: $\hat{P}(v_j | v_i) \xrightarrow{a.s.} P^*(v_j | v_i)$.

Figures (4)

  • Figure 1: An example of a vector-centric multimodal agent on a constrained query.
  • Figure 2: Overview of the NS-Mem framework. Raw multimodal data is processed through a three-layer memory prototype, maintained via the SK-Gen mechanism for distillation and incremental updates, and accessed through a hybrid retrieval framework designed for complex reasoning.
  • Figure 3: Case study on vector-centric Memory and NS-Mem.
  • Figure 4: Hyper-parameter analysis across different thresholds and weights. (a) Impact of $\tau$ on accuracy across query types. (b) Impact of of $\delta$ on knowledge consolidation and merge (c) Impact of $\alpha$ on accuracy and efficiency.

Theorems & Definitions (5)

  • Example 1
  • Definition 1
  • Theorem 1: Posterior Consistency
  • Theorem 2: Fusion Consistency
  • Theorem 3: Determinism Guarantee