Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs
Renfei Zhang, Manasa Kaniselvan, Niloofar Mireshghallah
TL;DR
The paper investigates whether reinforcement learning (RL) improves or harms memory and retrieval of structured, hierarchical knowledge in large language models. It shows RL-enhanced models outperform base and supervised-fine-tuned counterparts on knowledge recall tasks requiring hierarchical navigation, suggesting gains come from improved navigation rather than new knowledge acquisition. Through structured prompting, retrieval-depth analyses, and layer-wise activation studies, the authors demonstrate that much of the RL advantage stems from enhanced traversal mechanisms, with factual representations largely preserved while query processing changes. These findings challenge the notion of an alignment tax uniformly degrading memory and motivate approaches that separately optimize knowledge content and hierarchical navigation for more efficient reasoning systems.
Abstract
Reinforcement learning (RL) is often credited with improving language model reasoning and generalization at the expense of degrading memorized knowledge. We challenge this narrative by observing that RL-enhanced models consistently outperform their base and supervised fine-tuned (SFT) counterparts on pure knowledge recall tasks, particularly those requiring traversal of hierarchical, structured knowledge (e.g., medical codes). We hypothesize these gains stem not from newly acquired data, but from improved procedural skills in navigating and searching existing knowledge hierarchies within the model parameters. To support this hypothesis, we show that structured prompting, which explicitly guides SFTed models through hierarchical traversal, recovers most of the performance gap (reducing 24pp to 7pp on MedConceptsQA for DeepSeek-V3/R1). We further find that while prompting improves final-answer accuracy, RL-enhanced models retain superior ability to recall correct procedural paths on deep-retrieval tasks. Finally our layer-wise internal activation analysis reveals that while factual representations (e.g., activations for the statement "code 57.95 refers to urinary infection") maintain high cosine similarity between SFT and RL models, query representations (e.g., "what is code 57.95") diverge noticeably, indicating that RL primarily transforms how models traverse knowledge rather than the knowledge representation itself.
