Table of Contents
Fetching ...

RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

Zhenhang Yuan, Shenghai Yuan, Lihua Xie

Abstract

LLM agents often fail in closed-world embodied environments because actions must satisfy strict preconditions -- such as location, inventory, and container states -- and failure feedback is sparse. We identify two structurally coupled failure modes: (P1) invalid action generation and (P2) state drift, each amplifying the other in a degenerative cycle. We present RPMS, a conflict-managed architecture that enforces action feasibility via structured rule retrieval, gates memory applicability via a lightweight belief state, and resolves conflicts between the two sources via rules-first arbitration. On ALFWorld (134 unseen tasks), RPMS achieves 59.7% single-trial success with Llama 3.1 8B (+23.9 pp over baseline) and 98.5% with Claude Sonnet 4.5 (+11.9 pp); of the 8B gain, rule retrieval alone contributes +14.9 pp (statistically significant), making it the dominant factor. A key finding is that episodic memory is conditionally useful: it harms performance on some task types when used without grounding, but becomes a stable net positive once filtered by current state and constrained by explicit action rules. Adapting RPMS to ScienceWorld with GPT-4 yields consistent gains across all ablation conditions (avg. score 54.0 vs. 44.9 for the ReAct baseline), providing transfer evidence that the core mechanisms hold across structurally distinct environments.

RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

Abstract

LLM agents often fail in closed-world embodied environments because actions must satisfy strict preconditions -- such as location, inventory, and container states -- and failure feedback is sparse. We identify two structurally coupled failure modes: (P1) invalid action generation and (P2) state drift, each amplifying the other in a degenerative cycle. We present RPMS, a conflict-managed architecture that enforces action feasibility via structured rule retrieval, gates memory applicability via a lightweight belief state, and resolves conflicts between the two sources via rules-first arbitration. On ALFWorld (134 unseen tasks), RPMS achieves 59.7% single-trial success with Llama 3.1 8B (+23.9 pp over baseline) and 98.5% with Claude Sonnet 4.5 (+11.9 pp); of the 8B gain, rule retrieval alone contributes +14.9 pp (statistically significant), making it the dominant factor. A key finding is that episodic memory is conditionally useful: it harms performance on some task types when used without grounding, but becomes a stable net positive once filtered by current state and constrained by explicit action rules. Adapting RPMS to ScienceWorld with GPT-4 yields consistent gains across all ablation conditions (avg. score 54.0 vs. 44.9 for the ReAct baseline), providing transfer evidence that the core mechanisms hold across structurally distinct environments.
Paper Structure (63 sections, 7 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 63 sections, 7 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Representative methods positioned along C1 (executability enforcement) and C2 (state consistency control): ReAct yao2023react, Generative Agents 10.1145/3586183.3606763, Reflexion shinn2023reflexion, MemGPT packer2024memgptllmsoperatingsystems, Inner Monologue huang2022innermonologue, Voyager wang2023voyager, LATS pmlr-v235-zhou24r, Action Attention wu2022tackling, Imperative Learning doi:10.1177/02783649251353181, CAPE raman2024capecorrectiveactionsprecondition, NeSyC choi2025nesyc. Most prior work addresses one axis; RPMS targets both.
  • Figure 2: RPMS architecture. Each step: (1) parse observation into BeliefState and GoalSpec; (2) query Rule Manual and Episodic Memory in parallel; (3) filter experiences by state-signature compatibility; (4) resolve conflicts via Rules-First Arbitration; (5) query LLM with augmented prompt.
  • Figure 3: RPMS agent architecture overview, showing rule injection (C1: executability enforcement) and state-consistent memory filtering (C2: state consistency control) as the two core components that augment the LLM decision loop.
  • Figure 4: 2$\times$2 ablation results on ALFWorld (left, success rate %; backbone: Llama 3.1 8B) and ScienceWorld (right, avg. score 0--100; backbone: GPT-4). Dotted lines mark the additive-expectation baseline in each environment.
  • Figure 5: Learning curve: success rate vs. learning rounds for Memory-only and Rules+Memory configurations.