Table of Contents
Fetching ...

Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval

Yingyi Zhang, Junyi Li, Wenlin Zhang, Penyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, Xiangyu Zhao

TL;DR

RF-Mem (Recollection-Familiarity Memory Retrieval), a familiarity uncertainty-guided dual-path memory retriever, embeds human-like dual-process recognition into the retriever, avoiding full-context overhead and enabling scalable, adaptive personalization.

Abstract

Personalized large language models (LLMs) rely on memory retrieval to incorporate user-specific histories, preferences, and contexts. Existing approaches either overload the LLM by feeding all the user's past memory into the prompt, which is costly and unscalable, or simplify retrieval into a one-shot similarity search, which captures only surface matches. Cognitive science, however, shows that human memory operates through a dual process: Familiarity, offering fast but coarse recognition, and Recollection, enabling deliberate, chain-like reconstruction for deeply recovering episodic content. Current systems lack both the ability to perform recollection retrieval and mechanisms to adaptively switch between the dual retrieval paths, leading to either insufficient recall or the inclusion of noise. To address this, we propose RF-Mem (Recollection-Familiarity Memory Retrieval), a familiarity uncertainty-guided dual-path memory retriever. RF-Mem measures the familiarity signal through the mean score and entropy. High familiarity leads to the direct top-K Familiarity retrieval path, while low familiarity activates the Recollection path. In the Recollection path, the system clusters candidate memories and applies alpha-mix with the query to iteratively expand evidence in embedding space, simulating deliberate contextual reconstruction. This design embeds human-like dual-process recognition into the retriever, avoiding full-context overhead and enabling scalable, adaptive personalization. Experiments across three benchmarks and corpus scales demonstrate that RF-Mem consistently outperforms both one-shot retrieval and full-context reasoning under fixed budget and latency constraints. Our code can be found in the Reproducibility Statement.

Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval

TL;DR

RF-Mem (Recollection-Familiarity Memory Retrieval), a familiarity uncertainty-guided dual-path memory retriever, embeds human-like dual-process recognition into the retriever, avoiding full-context overhead and enabling scalable, adaptive personalization.

Abstract

Personalized large language models (LLMs) rely on memory retrieval to incorporate user-specific histories, preferences, and contexts. Existing approaches either overload the LLM by feeding all the user's past memory into the prompt, which is costly and unscalable, or simplify retrieval into a one-shot similarity search, which captures only surface matches. Cognitive science, however, shows that human memory operates through a dual process: Familiarity, offering fast but coarse recognition, and Recollection, enabling deliberate, chain-like reconstruction for deeply recovering episodic content. Current systems lack both the ability to perform recollection retrieval and mechanisms to adaptively switch between the dual retrieval paths, leading to either insufficient recall or the inclusion of noise. To address this, we propose RF-Mem (Recollection-Familiarity Memory Retrieval), a familiarity uncertainty-guided dual-path memory retriever. RF-Mem measures the familiarity signal through the mean score and entropy. High familiarity leads to the direct top-K Familiarity retrieval path, while low familiarity activates the Recollection path. In the Recollection path, the system clusters candidate memories and applies alpha-mix with the query to iteratively expand evidence in embedding space, simulating deliberate contextual reconstruction. This design embeds human-like dual-process recognition into the retriever, avoiding full-context overhead and enabling scalable, adaptive personalization. Experiments across three benchmarks and corpus scales demonstrate that RF-Mem consistently outperforms both one-shot retrieval and full-context reasoning under fixed budget and latency constraints. Our code can be found in the Reproducibility Statement.
Paper Structure (52 sections, 6 theorems, 22 equations, 19 figures, 17 tables, 3 algorithms)

This paper contains 52 sections, 6 theorems, 22 equations, 19 figures, 17 tables, 3 algorithms.

Key Result

Lemma 1

Assume the similarity distribution admits a monotone likelihood ratio in $s_i$ between relevant and nonrelevant items, and that the probe softmax temperature $\lambda$ is fixed. Then $\mathcal{E}(\text{Familiarity}\mid q)$ is nonincreasing in $\bar{s}$ and nondecreasing in $H(p)$, while $C(\text{Fam

Figures (19)

  • Figure 1: Comparison between standard familiarity-based retrieval and recollection-based retrieval in user health narratives. And the brain figure motivated by rugg2007eventyonelinas2024role.
  • Figure 2: The overall architecture of RF-Mem. A dual-process memory retrieval system dynamically switches between the Familiarity and the Recollection paths.
  • Figure 3: Illustration of adaptive study setup. Offline indexes (e.g., MemoryBank or origin memory) provide different storage, while RF-Mem serves as an online retrieval layer that adapts to them.
  • Figure 4: Illustration of the adaptive study setup. Nearline query expansion (e.g., HyDE) enriches the query representation, and RF-Mem operates as the online retrieval layer.
  • Figure 5: Illustration of adaptive study setup. Iterative RAG (e.g., Search-o1) provides a multi-turn retrieval for answer generation, while RF-Mem serves as the retrieval layer that adapts to it.
  • ...and 14 more figures

Theorems & Definitions (6)

  • Lemma 1: Monotonicity of proxy signals
  • Theorem 1: Threshold optimality within monotone policies
  • Lemma 2: Entropy certificate
  • Proposition 1: Bound on familiarity error under low entropy
  • Proposition 2: Gating error bound via concentration
  • Proposition 3: Complexity bounds