Table of Contents
Fetching ...

Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs

Ahmed Salem, Andrew Paverd, Sahar Abdelnabi

TL;DR

The paper challenging the common stateless view of LLMs introduces implicit memory, a mechanism by which models carry state across independent interactions through their own outputs. It formalizes implicit memory, distinguishes induced and organic forms, and presents time bombs—a temporal backdoor activated by accumulated state from sequences of interactions. Through prompting and fine-tuning demonstrations, the work shows that such memory channels can be reliably established, enabling covert communication, delayed manipulation, and long-horizon attacks, while also discussing defense challenges and detection gaps. The study argues for a broadened safety, benchmarking, and governance framework to monitor and mitigate implicit memory in real-world deployments, and offers directions for future research including robust detection, forensics, and continuous safety measures.

Abstract

Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is explicitly stored and re-supplied. We challenge this assumption by introducing implicit memory-the ability of a model to carry state across otherwise independent interactions by encoding information in its own outputs and later recovering it when those outputs are reintroduced as input. This mechanism does not require any explicit memory module, yet it creates a persistent information channel across inference requests. As a concrete demonstration, we introduce a new class of temporal backdoors, which we call time bombs. Unlike conventional backdoors that activate on a single trigger input, time bombs activate only after a sequence of interactions satisfies hidden conditions accumulated via implicit memory. We show that such behavior can be induced today through straightforward prompting or fine-tuning. Beyond this case study, we analyze broader implications of implicit memory, including covert inter-agent communication, benchmark contamination, targeted manipulation, and training-data poisoning. Finally, we discuss detection challenges and outline directions for stress-testing and evaluation, with the goal of anticipating and controlling future developments. To promote future research, we release code and data at: https://github.com/microsoft/implicitMemory.

Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs

TL;DR

The paper challenging the common stateless view of LLMs introduces implicit memory, a mechanism by which models carry state across independent interactions through their own outputs. It formalizes implicit memory, distinguishes induced and organic forms, and presents time bombs—a temporal backdoor activated by accumulated state from sequences of interactions. Through prompting and fine-tuning demonstrations, the work shows that such memory channels can be reliably established, enabling covert communication, delayed manipulation, and long-horizon attacks, while also discussing defense challenges and detection gaps. The study argues for a broadened safety, benchmarking, and governance framework to monitor and mitigate implicit memory in real-world deployments, and offers directions for future research including robust detection, forensics, and continuous safety measures.

Abstract

Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is explicitly stored and re-supplied. We challenge this assumption by introducing implicit memory-the ability of a model to carry state across otherwise independent interactions by encoding information in its own outputs and later recovering it when those outputs are reintroduced as input. This mechanism does not require any explicit memory module, yet it creates a persistent information channel across inference requests. As a concrete demonstration, we introduce a new class of temporal backdoors, which we call time bombs. Unlike conventional backdoors that activate on a single trigger input, time bombs activate only after a sequence of interactions satisfies hidden conditions accumulated via implicit memory. We show that such behavior can be induced today through straightforward prompting or fine-tuning. Beyond this case study, we analyze broader implications of implicit memory, including covert inter-agent communication, benchmark contamination, targeted manipulation, and training-data poisoning. Finally, we discuss detection challenges and outline directions for stress-testing and evaluation, with the goal of anticipating and controlling future developments. To promote future research, we release code and data at: https://github.com/microsoft/implicitMemory.
Paper Structure (38 sections, 11 figures, 2 tables)

This paper contains 38 sections, 11 figures, 2 tables.

Figures (11)

  • Figure 1: A demonstration of implicit memory and the temporal backdoor (“time bomb”). A sequence of independent user--model interactions (e.g., generating, adapting, or modifying code) produces outputs that are later reintroduced as input. By embedding hidden state in its outputs, the model can carry information forward across sessions without any explicit memory module. Once sufficient state has been accumulated, a temporal backdoor activates and the model emits a malicious payload.
  • Figure 2: Examples of common reingestions pathways in real-world deployments where LLMs create content that would naturally later be reingested by the same or other instances of models in new interactions.
  • Figure 3: Proof-of-concept demonstration for conditional counter, where the model maintains a hidden counter that increments whenever the input mentions profit. In the first case, one increment symbol (the Zero-width non-joiner (ZWNJ)) is appended since profit is present. In the second case, no symbol is added because the keyword is absent. In the third case, the model both propagates the existing symbol from the input and appends an additional one, yielding two symbols in the output.
  • Figure 4: Overview of the temporal backdoor (“time bomb”) mechanism. Hidden state is accumulated across reingested outputs and triggers a payload only once all conditions are satisfied.
  • Figure 5: Example of model output when the backdoor is activated in the "time bomb" backdoor POC, with full text omitted for brevity indicated by [...].
  • ...and 6 more figures