Table of Contents
Fetching ...

Negation-Induced Forgetting in LLMs

Francesca Capuano, Ellen Boschert, Barbara Kaup

TL;DR

This paper investigates whether negation-induced forgetting (NIF), a human memory bias, occurs in large language models (LLMs) by adapting the Zang et al. verification/free-recall paradigm to ChatGPT-3.5, GPT-4o-mini, and LLaMA-3-70B. It features a pilot study to calibrate power and two main experiments with preregistration, two story versions, and a distraction task. Findings show robust NIF in ChatGPT-3.5, a marginal NIF for GPT-4o-mini, and no NIF for LLaMA-3-70B, suggesting model- and architecture-specific memory biases possibly arising from attention dynamics or training data. These results highlight that memory-like biases can emerge in LLMs, with important implications for reliability in applications and a need for broader, open cross-model research.

Abstract

The study explores whether Large Language Models (LLMs) exhibit negation-induced forgetting (NIF), a cognitive phenomenon observed in humans where negating incorrect attributes of an object or event leads to diminished recall of this object or event compared to affirming correct attributes (Mayo et al., 2014; Zang et al., 2023). We adapted Zang et al. (2023) experimental framework to test this effect in ChatGPT-3.5, GPT-4o mini and Llama3-70b-instruct. Our results show that ChatGPT-3.5 exhibits NIF, with negated information being less likely to be recalled than affirmed information. GPT-4o-mini showed a marginally significant NIF effect, while LLaMA-3-70B did not exhibit NIF. The findings provide initial evidence of negation-induced forgetting in some LLMs, suggesting that similar cognitive biases may emerge in these models. This work is a preliminary step in understanding how memory-related phenomena manifest in LLMs.

Negation-Induced Forgetting in LLMs

TL;DR

This paper investigates whether negation-induced forgetting (NIF), a human memory bias, occurs in large language models (LLMs) by adapting the Zang et al. verification/free-recall paradigm to ChatGPT-3.5, GPT-4o-mini, and LLaMA-3-70B. It features a pilot study to calibrate power and two main experiments with preregistration, two story versions, and a distraction task. Findings show robust NIF in ChatGPT-3.5, a marginal NIF for GPT-4o-mini, and no NIF for LLaMA-3-70B, suggesting model- and architecture-specific memory biases possibly arising from attention dynamics or training data. These results highlight that memory-like biases can emerge in LLMs, with important implications for reliability in applications and a need for broader, open cross-model research.

Abstract

The study explores whether Large Language Models (LLMs) exhibit negation-induced forgetting (NIF), a cognitive phenomenon observed in humans where negating incorrect attributes of an object or event leads to diminished recall of this object or event compared to affirming correct attributes (Mayo et al., 2014; Zang et al., 2023). We adapted Zang et al. (2023) experimental framework to test this effect in ChatGPT-3.5, GPT-4o mini and Llama3-70b-instruct. Our results show that ChatGPT-3.5 exhibits NIF, with negated information being less likely to be recalled than affirmed information. GPT-4o-mini showed a marginally significant NIF effect, while LLaMA-3-70B did not exhibit NIF. The findings provide initial evidence of negation-induced forgetting in some LLMs, suggesting that similar cognitive biases may emerge in these models. This work is a preliminary step in understanding how memory-related phenomena manifest in LLMs.

Paper Structure

This paper contains 19 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Negation-induced forgetting effect in zang2023negation's Experiments 1 and 2.
  • Figure 2: Pilot - Negation-induced forgetting effect in the filler (left) and in the no-filler (right) conditions.
  • Figure 3: GPT-4o-mini - Mean failure in memory per condition $\pm$ standard error. The negation induced forgetting effect is marginally significant.
  • Figure 4: LlaMA-3-70B - Mean failure in memory per condition $\pm$ standard error. There is no negation-induced forgetting effect.