Table of Contents
Fetching ...

Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Adeela Bashir, Zhao Song, Ndidi Bianca Ogbo, Nataliya Balabanova, Martin Smit, Chin-wing Leung, Paolo Bova, Manuel Chica Serrano, Dhanushka Dissanayake, Manh Hong Duong, Elias Fernandez Domingos, Nikita Huber-Kralj, Marcus Krellner, Andrew Powell, Stefan Sarkadi, Fernando P. Santos, Zia Ush Shamszaman, Chaimaa Tarzi, Paolo Turrini, Grace Ibukunoluwa Ufeoshi, Victor A. Vargas-Perez, Alessandro Di Stefano, Simon T. Powers, The Anh Han

Abstract

AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users' trust as a one-shot adoption choice rather than as a dynamic, evolving process shaped by repeated interactions. We instead model trust as reduced monitoring in a repeated, asymmetric interaction between users and AI developers, where checking AI behaviour is costly. Using evolutionary game theory, we study how user trust strategies and developer choices between safe (compliant) and unsafe (non-compliant) AI co-evolve under different levels of monitoring cost and institutional regimes. We complement the infinite-population replicator analysis with stochastic finite-population dynamics and reinforcement learning (Q-learning) simulations. Across these approaches, we find three robust long-run regimes: no adoption with unsafe development, unsafe but widely adopted systems, and safe systems that are widely adopted. Only the last is desirable, and it arises when penalties for unsafe behaviour exceed the extra cost of safety and users can still afford to monitor at least occasionally. Our results formally support governance proposals that emphasise transparency, low-cost monitoring, and meaningful sanctions, and they show that neither regulation alone nor blind user trust is sufficient to prevent evolutionary drift towards unsafe or low-adoption outcomes.

Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Abstract

AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users' trust as a one-shot adoption choice rather than as a dynamic, evolving process shaped by repeated interactions. We instead model trust as reduced monitoring in a repeated, asymmetric interaction between users and AI developers, where checking AI behaviour is costly. Using evolutionary game theory, we study how user trust strategies and developer choices between safe (compliant) and unsafe (non-compliant) AI co-evolve under different levels of monitoring cost and institutional regimes. We complement the infinite-population replicator analysis with stochastic finite-population dynamics and reinforcement learning (Q-learning) simulations. Across these approaches, we find three robust long-run regimes: no adoption with unsafe development, unsafe but widely adopted systems, and safe systems that are widely adopted. Only the last is desirable, and it arises when penalties for unsafe behaviour exceed the extra cost of safety and users can still afford to monitor at least occasionally. Our results formally support governance proposals that emphasise transparency, low-cost monitoring, and meaningful sanctions, and they show that neither regulation alone nor blind user trust is sufficient to prevent evolutionary drift towards unsafe or low-adoption outcomes.

Paper Structure

This paper contains 21 sections, 20 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Interaction Sequences between Strategies. Each block represents an action of the user (left stack) and the developer (right stack), which can be cooperate (white) or defect (dark red). Users may also monitor the creator's behaviour, paying a cost (symbols to the right of the stacks of TFT, TUA and DtG). The figure illustrates the differences between the conditional strategies: While TFT always observes, TUA may enter a state of trust after observing that the creator cooperated for $\theta_T$ subsequent rounds (in this example $\theta_T = 3$), whereas DtG may enter a state of distrust after observing $\theta_D$ defections. In those states, observation only happens with probability $p_T$ and $p_D$ respectively.
  • Figure 2: Trust-based strategies enhance user adoption, while it declines as monitoring cost becomes expensive. The first and second columns show the stationary distributions of each state as a function of monitoring cost for scenarios without and with trust-based strategies, respectively. The third column displays the difference in user adoption levels between these two cases across varying monitoring costs. Rows from top to bottom correspond to increasing levels of institutional punishment ($v=0.1$, $0.5$, and $1$). Parameters are set to $b_u=b_c=4$, $\beta=0.1$, $Z_u=Z_c=100$, $c=0.5$, $\mu=-0.2$, $r=10$, $\theta_t=\theta_D=3$, and $p_T=p_D=0.25$.
  • Figure 3: Trust-based strategies enhance user adoption, which further increases with stronger institutional punishment. The first and second columns show the stationary distributions of each state as a function of monitoring cost for scenarios without and with trust-based strategies, respectively. The third column displays the difference in user adoption levels between these two cases across varying monitoring costs. Rows from top to bottom correspond to increasing levels of institutional punishment ($\epsilon=0.1$, $0.5$, and $1$). Parameters are set to $b_u=b_c=4$, $\beta=0.1$, $Z_u=Z_c=100$, $c=0.5$, $\mu=-0.2$, $r=10$, $\theta_t=\theta_D=3$, and $p_T=p_D=0.25$.
  • Figure 4: Numerical modelling of user (top row) and creator (bottom row) cooperation rates for $p_T = 1/4, p_D = 1/4$, $\theta_T = 3$, $\theta_D = 3$, $b_{\mathrm{u}} = 4$, $b_{\mathrm{d}} = 4$, $r =10, \mu = -2/10$, $v = 1/10, c =1/2$. The initial conditions were equal distribution of all strategies among the users and the creators.
  • Figure 5: Percentage of users (top row) adopting different strategies and creator (bottom row) cooperation rates across episodes under Q-learning. The game setting are $p_T = 1/4, p_D = 1/4$, $\theta_T = 3$, $\theta_D = 3$, $b_{\mathrm{u}} = 4$, $b_{\mathrm{d}} = 4$, $r =10, \mu = -2/10$, $v = 1/10, c =1/2$. For Q-learning, $\alpha=0.05$ and $\epsilon_L=0.05$.
  • ...and 3 more figures