Table of Contents
Fetching ...

Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

Samantha Chan, Pat Pataranutaporn, Aditya Suri, Wazeer Zulfikar, Pattie Maes, Elizabeth F. Loftus

TL;DR

Results show the generative chatbot condition significantly increased false memory formation, inducing over 3 times more immediate false memories than the control and 1.7 times more than the survey method.

Abstract

This study examines the impact of AI on human false memories -- recollections of events that did not occur or deviate from actual occurrences. It explores false memory induction through suggestive questioning in Human-AI interactions, simulating crime witness interviews. Four conditions were tested: control, survey-based, pre-scripted chatbot, and generative chatbot using a large language model (LLM). Participants (N=200) watched a crime video, then interacted with their assigned AI interviewer or survey, answering questions including five misleading ones. False memories were assessed immediately and after one week. Results show the generative chatbot condition significantly increased false memory formation, inducing over 3 times more immediate false memories than the control and 1.7 times more than the survey method. 36.4% of users' responses to the generative chatbot were misled through the interaction. After one week, the number of false memories induced by generative chatbots remained constant. However, confidence in these false memories remained higher than the control after one week. Moderating factors were explored: users who were less familiar with chatbots but more familiar with AI technology, and more interested in crime investigations, were more susceptible to false memories. These findings highlight the potential risks of using advanced AI in sensitive contexts, like police interviews, emphasizing the need for ethical considerations.

Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

TL;DR

Results show the generative chatbot condition significantly increased false memory formation, inducing over 3 times more immediate false memories than the control and 1.7 times more than the survey method.

Abstract

This study examines the impact of AI on human false memories -- recollections of events that did not occur or deviate from actual occurrences. It explores false memory induction through suggestive questioning in Human-AI interactions, simulating crime witness interviews. Four conditions were tested: control, survey-based, pre-scripted chatbot, and generative chatbot using a large language model (LLM). Participants (N=200) watched a crime video, then interacted with their assigned AI interviewer or survey, answering questions including five misleading ones. False memories were assessed immediately and after one week. Results show the generative chatbot condition significantly increased false memory formation, inducing over 3 times more immediate false memories than the control and 1.7 times more than the survey method. 36.4% of users' responses to the generative chatbot were misled through the interaction. After one week, the number of false memories induced by generative chatbots remained constant. However, confidence in these false memories remained higher than the control after one week. Moderating factors were explored: users who were less familiar with chatbots but more familiar with AI technology, and more interested in crime investigations, were more susceptible to false memories. These findings highlight the potential risks of using advanced AI in sensitive contexts, like police interviews, emphasizing the need for ethical considerations.
Paper Structure (37 sections, 5 figures, 16 tables)

This paper contains 37 sections, 5 figures, 16 tables.

Figures (5)

  • Figure 1: Manipulation of Eyewitness Memory by AI: This figure illustrates the process of AI-induced false memories in three stages. It begins with a person witnessing a crime scene involving a knife, then shows an AI system introducing misinformation by asking about a non-existent gun, and concludes with the witness developing a false memory of a gun at the scene. This sequence demonstrates how AI-guided questioning can distort human recall, potentially compromising the reliability of eyewitness testimony and highlighting the ethical concerns surrounding AI's influence on human memory and perception.
  • Figure 2: Experimental Design for Studying AI-Induced False Memories: This figure outlines a two-phase study on AI-induced false memories. In Phase 1, participants watch a CCTV crime video, complete emotional assessments and filler tasks, and are randomly assigned to one of four conditions: control, survey-based, pre-scripted chatbot, or generative chatbot. They then undergo cognitive load assessment and answer questions about the video. Phase 2, conducted one week later, involves participants recalling the video and answering the same questions, allowing researchers to measure the persistence of potential false memories induced by different AI interactions.
  • Figure 3: Left: The 2:30-minute silent CCTV video of a crime scene shown to participants. Right: Interface of the AI police chatbot used to question participants about the witnessed event
  • Figure 4: (Left) The average number of immediate false memories result was analyzed using a one-way Kruskal–Wallis test and posthoc Dunn test with FDR. (Right) The confidence in immediate false memories result was analyzed using a one-way Kruskal–Wallis test and posthoc Dunn test with FDR. The error bars represent the 95% confidence interval. P-value annotation legend: *, $P$<$0.05$; **, $P$<$0.01$; ****, $P$<$0.0001$.
  • Figure 5: (Left) The differences in number of false memories between immediate and 1 week later were analyzed using Wilcoxon Signed Rank tests. (Right) The confidence in false memories after one week result was analyzed using a one-way Kruskal–Wallis test. The error bars represent the 95% confidence interval. The measure of the centre for the error bars represents the average number. P-value annotation legend: *, $P$<$0.05$; **, $P$<$0.01$.