Table of Contents
Fetching ...

The AI Memory Gap: Users Misremember What They Created With AI or Without

Tim Zindulka, Sven Goller, Daniela Fernandes, Robin Welsch, Daniel Buschek

TL;DR

This work addresses memory for authorship in AI-assisted writing, showing that users systematically misattribute sources when AI is involved. It uses a preregistered, two-phase, within-subject experiment with 184 participants generating ideas and elaborations with or without a chat AI, followed by memory tests and distractors after about one week. The results reveal a robust AI memory gap: source memory declines with AI involvement, especially in mixed workflows, and memory performance is better when AI use is consistent across ideation and elaboration; confidence often exceeds accuracy. A two-component Multinomial Processing Tree model confirms distinct memory and guessing processes for ideas versus elaborations, and the findings motivate explicit provenance, memory-aware UI design, and further research into how different AI roles and interfaces shape memory and responsibility in creative work.

Abstract

As large language models (LLMs) become embedded in interactive text generation, disclosure of AI as a source depends on people remembering which ideas or texts came from themselves and which were created with AI. We investigate how accurately people remember the source of content when using AI. In a pre-registered experiment, 184 participants generated and elaborated on ideas both unaided and with an LLM-based chatbot. One week later, they were asked to identify the source (noAI vs withAI) of these ideas and texts. Our findings reveal a significant gap in memory: After AI use, the odds of correct attribution dropped, with the steepest decline in mixed human-AI workflows, where either the idea or elaboration was created with AI. We validated our results using a computational model of source memory. Discussing broader implications, we highlight the importance of considering source confusion in the design and use of interactive text generation technologies.

The AI Memory Gap: Users Misremember What They Created With AI or Without

TL;DR

This work addresses memory for authorship in AI-assisted writing, showing that users systematically misattribute sources when AI is involved. It uses a preregistered, two-phase, within-subject experiment with 184 participants generating ideas and elaborations with or without a chat AI, followed by memory tests and distractors after about one week. The results reveal a robust AI memory gap: source memory declines with AI involvement, especially in mixed workflows, and memory performance is better when AI use is consistent across ideation and elaboration; confidence often exceeds accuracy. A two-component Multinomial Processing Tree model confirms distinct memory and guessing processes for ideas versus elaborations, and the findings motivate explicit provenance, memory-aware UI design, and further research into how different AI roles and interfaces shape memory and responsibility in creative work.

Abstract

As large language models (LLMs) become embedded in interactive text generation, disclosure of AI as a source depends on people remembering which ideas or texts came from themselves and which were created with AI. We investigate how accurately people remember the source of content when using AI. In a pre-registered experiment, 184 participants generated and elaborated on ideas both unaided and with an LLM-based chatbot. One week later, they were asked to identify the source (noAI vs withAI) of these ideas and texts. Our findings reveal a significant gap in memory: After AI use, the odds of correct attribution dropped, with the steepest decline in mixed human-AI workflows, where either the idea or elaboration was created with AI. We validated our results using a computational model of source memory. Discussing broader implications, we highlight the importance of considering source confusion in the design and use of interactive text generation technologies.

Paper Structure

This paper contains 73 sections, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Study design and procedure. The experiment consisted of two phases separated by one week. In Phase 1 (top), participants first generated five short ideas (1-3 keywords each) for eight problem statements, alternating between conditions with and without AI assistance (top left). Second, they elaborated on each idea with a one-sentence explanation, again alternating between AI-assisted and unassisted conditions (top right). Problem order and withAI/noAI order were counterbalanced across participants. Phase 2 (bottom) started after around one week (accessible after 6 days with a 48-hour completion window). In Phase 2, participants saw 60 elaborations (40 self-generated and 20 distractors) together with the original problem statements. For each, they indicated whether they remembered working on this item, and if so, attributed the source of both the idea and the elaboration (self vs. AI), along with confidence ratings. Distractors included both elaborations from familiar problems (known-topic) and from unseen problems (unknown-topic). This procedure implemented a 2×2 within-subjects design (noAI vs. withAI during ideation and elaboration) to measure item memory and source attribution.
  • Figure 2: Creating ideas and elaborations in Phase 1: (A) In the ideation phase participants entered five ideas for each problem into the respective input fields. (B) After creating ideas for every problem, they elaborated on those ideas in one sentence each. AI-support alternated for each problem in the ideation task and each idea in the elaboration task. Therefore all participants created half of the ideas and elaborations with AI support (C) and the other half without (D). The progress bar (E) indicates how far along participants are in the study.
  • Figure 3: Memory and source attribution in Phase 2: We displayed task description, original problem statement, and solution text created in Phase 1. Participants first answered whether they remember working on a solution. If so, they indicated whether they came up with the underlying idea and the solution text on their own or with AI support. They also provided confidence ratings (0-100) for each.
  • Figure 4: Model-based interaction plots of predicted source attribution accuracy for three types of memory: whether participants remembered working on the solution, remembered the source of the idea, and remembered the source of the elaboration. Probabilities are estimated from binomial GLMMs with participants as random intercepts. Error bars represent 95% confidence intervals. The plots show that item memory was high across conditions, while idea and elaboration source memory exhibited strong interactions: Ideas created by participants without AI were better remembered when the elaboration was also created without AI. In contrast, ideas generated with AI were better remembered when AI was also used to elaborate on them.
  • Figure 5: Confidence by idea and elaboration ground truth sources. Points show estimated marginal means; error bars show adjusted 95% CIs.
  • ...and 4 more figures