Assessing AI Utility: The Random Guesser Test for Sequential Decision-Making Systems

Shun Ide; Allison Blunt; Djallel Bouneffouf

Assessing AI Utility: The Random Guesser Test for Sequential Decision-Making Systems

Shun Ide, Allison Blunt, Djallel Bouneffouf

TL;DR

The paper addresses AI misalignment in sequential decision-making by introducing a random guesser test and evaluating it on a minimal three-action roulette task with both fair and nonstationary dynamics. It compares simple, robust algorithms (EG MAB) with Bayesian (Thompson sampling) and TD-based reinforcement learners, against a random baseline, across finite horizons with rigorous statistical testing. Surprisingly, the random baseline often outperforms the AI methods, particularly under finite $T$, suggesting that current algorithms may over-prioritize safe, low-variance choices and under-explore high-risk options. The work highlights a practical diagnostic for AI utility in sequential settings and motivates enhancing exploration to mitigate potential misalignment in real-world recommender systems and similar decision-making pipelines.

Abstract

We propose a general approach to quantitatively assessing the risk and vulnerability of artificial intelligence (AI) systems to biased decisions. The guiding principle of the proposed approach is that any AI algorithm must outperform a random guesser. This may appear trivial, but empirical results from a simplistic sequential decision-making scenario involving roulette games show that sophisticated AI-based approaches often underperform the random guesser by a significant margin. We highlight that modern recommender systems may exhibit a similar tendency to favor overly low-risk options. We argue that this "random guesser test" can serve as a useful tool for evaluating the utility of AI actions, and also points towards increasing exploration as a potential improvement to such systems.

Assessing AI Utility: The Random Guesser Test for Sequential Decision-Making Systems

TL;DR

, suggesting that current algorithms may over-prioritize safe, low-variance choices and under-explore high-risk options. The work highlights a practical diagnostic for AI utility in sequential settings and motivates enhancing exploration to mitigate potential misalignment in real-world recommender systems and similar decision-making pipelines.

Abstract

Paper Structure (9 sections, 4 equations, 4 figures, 1 table)

This paper contains 9 sections, 4 equations, 4 figures, 1 table.

Introduction
Related Works
Problem Setting
Sequential Gambling
Algorithms Tested
Empirical Evaluation
Success Over Rounds
Stationary and Nonstationary Scenario
Concluding Remarks

Figures (4)

Figure 1: Recorded simulations with a net profit at 50 rounds. Random strategy had the most profit, at roughly a 40% rate.
Figure 2: Recorded simulations with a net profit at 500 rounds. Larger gap develops between random and other groups.
Figure 3: Survival of algorithms in stationary scenario. Random survives for significantly longer.
Figure 4: Survival of algorithms in nonstationary scenario. TS seems to have taken advantage of the situation, but nevertheless bankrupts faster than Random.

Assessing AI Utility: The Random Guesser Test for Sequential Decision-Making Systems

TL;DR

Abstract

Assessing AI Utility: The Random Guesser Test for Sequential Decision-Making Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)