Temporal Knowledge-Graph Memory in a Partially Observable Environment

Taewoon Kim; Vincent François-Lavet; Michael Cochez

Temporal Knowledge-Graph Memory in a Partially Observable Environment

Taewoon Kim, Vincent François-Lavet, Michael Cochez

TL;DR

This work tackles how to represent and leverage long-term memory in partially observable environments by endowing both the world state and the agent’s memory with explicit knowledge-graph representations. It introduces Room Environment v3, a deterministic, KG-centered testbed where observations and the hidden state are RDF graphs, and memory can be extended to a temporal KG via RDF-star qualifiers. The study compares symbolic KG-based memory (RDF and RDF-star with time_added, last_accessed, and num_recalled) to neural sequence baselines (LSTM and Transformer) under identical observations and query conditions, across memory capacities. Findings show that temporal qualifiers substantially improve stability and generalization, with the symbolic TKG agent achieving roughly fourfold higher QA accuracy than neural baselines at high memory capacity, and symbolic memory enabling full room coverage and transparent memory evolution. The results demonstrate the value of interpretable, graph-structured memory in partially observable domains and provide a reproducible benchmarking platform for future neuro-symbolic memory research.

Abstract

Agents in partially observable environments require persistent memory to integrate observations over time. While KGs (knowledge graphs) provide a natural representation for such evolving state, existing benchmarks rarely expose agents to environments where both the world dynamics and the agent's memory are explicitly graph-shaped. We introduce the Room Environment v3, a configurable environment whose hidden state is an RDF KG and whose observations are RDF triples. The agent may extend these observations into a temporal KG when storing them in long-term memory. The environment is easily adjustable in terms of grid size, number of rooms, inner walls, and moving objects. We define a lightweight temporal KG memory for agents, based on RDF-star-style qualifiers (time_added, last_accessed, num_recalled), and evaluate several symbolic baselines that maintain and query this memory under different capacity constraints. Two neural sequence models (LSTM and Transformer) serve as contrasting baselines without explicit KG structure. Agents train on one layout and are evaluated on a held-out layout with the same dynamics but a different query order, exposing train-test generalization gaps. In this setting, temporal qualifiers lead to more stable performance, and the symbolic TKG (temporal knowledge graph) agent achieves roughly fourfold higher test QA (question-answer) accuracy than the neural baselines under the same environment and query conditions. The environment, agent implementations, and experimental scripts are released for reproducible research at https://github.com/humemai/agent-room-env-v3 and https://github.com/humemai/room-env.

Temporal Knowledge-Graph Memory in a Partially Observable Environment

TL;DR

Abstract

Paper Structure (36 sections, 8 equations, 4 figures, 1 table)

This paper contains 36 sections, 8 equations, 4 figures, 1 table.

Introduction
Code and environment.
Background
RDF, RDF-star, and Knowledge Graph Representations
Temporal Knowledge Graphs
Symbolic Memory and Semantic State Tracking
Semantic Representations for Agents and Interactive Environments
Room Environment v3
Hidden State and Dynamics
Observations as RDF Graph Fragments
Deterministic Question–Move Loop
Configurable Layouts and Test Split
Memory Interface (Agent-Agnostic)
Default Configuration
Agents
...and 21 more sections

Figures (4)

Figure 1: Two views of the same hidden state ($s_{t=99}$). The agent’s observation ($o_t$) is the induced RDF subgraph of the current room and its visible adjacency relations. Full step-by-step visualizations and convergence analysis are provided in the anonymous repository.
Figure 2: Train--test QA accuracy for all agents across long-term memory capacities. Mean and standard deviation are computed across $5$ seeds. Raw values can be found in the anonymous repository.
Figure 3: Coverage metrics for the four agents, at the long-term memory capacity of 512. Neural agents halt exploration early, while symbolic agents visit all $49$ rooms and accumulate nearly all unique triples. Raw values can be found in the anonymous repository.
Figure 4: Memory state of the RDF-star agent at $t=0$, $t=50$, and $t=99$ (capacity $512$). Counts in parentheses (e.g., "at_location (3)") indicate how many RDF-star memories share the same main triple but differ in their temporal qualifiers.

Temporal Knowledge-Graph Memory in a Partially Observable Environment

TL;DR

Abstract

Temporal Knowledge-Graph Memory in a Partially Observable Environment

Authors

TL;DR

Abstract

Table of Contents

Figures (4)