Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
Jingran Zhang, Ning Li, Justin Cui
TL;DR
The paper evaluates OpenAI's ChatGPT Atlas on web-based games to understand its performance in dynamic interactive tasks. Using a zero-shot, browser-based protocol across five games, it reports strong Sudoku-style logical reasoning but poor real-time motor control in reflex-like tasks, and limited autonomous behavior in RPG navigation. The findings highlight Atlas's analytical strengths alongside notable execution and contextual understanding gaps, suggesting that browser-control capabilities suffice for information tasks but require targeted improvements for real-time interaction and long-horizon planning. This work provides a pragmatic benchmark and actionable directions for advancing generalist web agents toward robust, end-to-end interaction in complex online environments.
Abstract
OpenAI's ChatGPT Atlas introduces new capabilities for web interaction, enabling the model to analyze webpages, process user intents, and execute cursor and keyboard inputs directly within the browser. While its capacity for information retrieval tasks has been demonstrated, its performance in dynamic, interactive environments remains less explored. In this study, we conduct an early evaluation of Atlas's web interaction capabilities using browser-based games as test scenarios, including Google's T-Rex Runner, Sudoku, Flappy Bird, and Stein.world. We employ in-game performance scores as quantitative metrics to assess performance across different task types. Our results show that Atlas performs strongly in logical reasoning tasks like Sudoku, completing puzzles significantly faster than human baselines, but struggles substantially in real-time games requiring precise timing and motor control, often failing to progress beyond initial obstacles. These findings suggest that while Atlas demonstrates capable analytical processing, there remain notable limitations in dynamic web environments requiring real-time interaction. The website of our project can be found at https://atlas-game-eval.github.io.
