Table of Contents
Fetching ...

In-context learning agents are asymmetric belief updaters

Johannes A. Schubert, Akshay K. Jagadish, Marcel Binz, Eric Schulz

TL;DR

The paper investigates how in-context learning updates beliefs in LLMs solving 2AFC tasks, revealing an asymmetric optimism bias for chosen outcomes that depends on agency and feedback framing. By fitting Rescorla-Wagner–style models ($\alpha^+$, $\alpha^-$; and RW±) to LLMs, humans, and Meta-RL agents, the authors show that partial feedback induces positive-updating bias, whereas full feedback shifts updating for unchosen options toward negative prediction errors, consistent with confirmation bias. Agency modulates these effects: asymmetric updating disappears without agency and re-emerges when agency is present, with Meta-RL agents displaying analogous patterns. These findings suggest that how a problem is framed rationally shapes in-context learning and offer a methodological approach to diagnose learning dynamics in artificial agents.

Abstract

We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. Taken together, our results contribute to our understanding of how in-context learning works by highlighting that the framing of a problem significantly influences how learning occurs, a phenomenon also observed in human cognition.

In-context learning agents are asymmetric belief updaters

TL;DR

The paper investigates how in-context learning updates beliefs in LLMs solving 2AFC tasks, revealing an asymmetric optimism bias for chosen outcomes that depends on agency and feedback framing. By fitting Rescorla-Wagner–style models (, ; and RW±) to LLMs, humans, and Meta-RL agents, the authors show that partial feedback induces positive-updating bias, whereas full feedback shifts updating for unchosen options toward negative prediction errors, consistent with confirmation bias. Agency modulates these effects: asymmetric updating disappears without agency and re-emerges when agency is present, with Meta-RL agents displaying analogous patterns. These findings suggest that how a problem is framed rationally shapes in-context learning and offer a methodological approach to diagnose learning dynamics in artificial agents.

Abstract

We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. Taken together, our results contribute to our understanding of how in-context learning works by highlighting that the framing of a problem significantly influences how learning occurs, a phenomenon also observed in human cognition.
Paper Structure (20 sections, 6 equations, 9 figures)

This paper contains 20 sections, 6 equations, 9 figures.

Figures (9)

  • Figure 1: Schematic of our methodology, where we evaluate the learning dynamics of LLMs, humans, and meta-reinforcement learning (Meta-RL) agents on two-alternative forced choice tasks. After evaluating the agents on the tasks, we fit variants of cognitive models based on the Rescorla-Wagner (RW) model to the resulting behavior. Finally, we analyze the fitted models and extract and compare the learning rates.
  • Figure 2: 2AFC task with partial feedback. (a) Presentation of a single trial. First, two slot machines, shown as symbols, are presented. After a choice is made, the outcome is shown. (b) Average performance of the LLM and humans measured in terms of regret. Performance improves over trials. (c) Model comparison of the Rescorla-Wagner (RW) model and the RW$\pm$ model. For both the LLM (left) and human participants (right), RW$\pm$ provides a better fit to the data, as indicated by the average posterior probability (PP). (d) Average learning rates of the RW$\pm$ model for the LLM and human participants. Both agents show a stronger response to positive prediction errors than to negative prediction errors. Human participant data reproduced from lefebvre2017. Error bars and shaded areas and correspond to 95% CIs.
  • Figure 3: 2AFC task with full feedback. (a) Presentation of a single trial: The two slow machines are again shown as symbols. After a choice is made, the outcome of both the chosen and the unchosen option is shown. (b) Average regret of the LLM on partial and full feedback blocks, showing that the additional information of full feedback blocks leads to improved performance. (c) Average regret of the LLM and humans for full feedback blocks. The performance of the LLM improves over trials and is on par with human performance. (d) Learning rates of the full feedback model with two learning rates -- for positive and negative prediction errors -- for the chosen and unchosen slot machine. Both agents have an optimism bias for the chosen option and a pessimism bias for the unchosen option. Human participant data reproduced from chambon2020. Error bars and shaded areas correspond to 95% CIs.
  • Figure 4: 2AFC task for the agency condition (a) Presentation of a single forced-choice trial: In forced-choice trials, one of the two slot machines is preselected (red square) and its outcome is presented directly to the LLM. (b) Average regret of the LLM in mixed-choice and free-choice blocks, showing that the additional information of forced-choice trials in mixed-choice blocks leads to improved performance. (c) Average regret for the LLM and humans for mixed-choice blocks. The performance of the LLM outperforms the human participants over trials. (d) Learning rates of the $3\alpha$ model with two learning rates for the free-choice trials and one learning rate for the forced-choice trials. Both agents integrate feedback for positive and negative prediction errors in free-choice trials asymmetrically, whereas feedback from forced-choice trials is integrated symmetrically. Human participant data reproduced from chambon2020. Error bars and shaded areas correspond to 95% CIs.
  • Figure 5: Learning rate analyses for the Meta-RL agent. (a) In the partial feedback task, the RW$\pm$ provided a better fit to the Meta-RL agents' behavior ($\text{PP}\textsubscript{RW\textpm}= 0.99 \pm 0.00$) and showed an optimistic tendency to integrate information. (b) In the full feedback task, the tendency to integrate positive outcomes for chosen options optimistically and negative outcomes for unchosen options pessimistically is even more pronounced than in the LLMs (left). Model comparison showed that the simplified confirmatory model ($2\alpha$) fits the data better (right). (c) In the agency condition, the $3\alpha$ best fit the simulated behavior ($\text{PP}_{3\alpha}=0.85 \pm 0.04$), implying that information was integrated asymmetrically in free-choice trials and symmetrically in forced-choice trials. Error bars correspond to 95% CIs.
  • ...and 4 more figures