Play to Generalize: Learning to Reason Through Game Play
Yunfei Xie, Yinsong Ma, Shiyi Lan, Alan Yuille, Junfei Xiao, Chen Wei
TL;DR
This work introduces Visual Game Learning (ViGaL), a post-training paradigm that finetunes a 7B multimodal LLM via reinforcement learning on simple visual arcade games to elicit transferable reasoning. Despite using no in-domain math data during RL, ViGaL shows strong zero-shot generalization to multimodal math and cross-domain benchmarks, often surpassing specialist models trained on target tasks, while preserving broad visual capabilities. Ablations reveal that game design, reward structure, and multimodal inputs jointly shape the downstream benefits, and combining multiple games yields additive gains across math subfields. The results point to a scalable strategy for unlocking generalizable reasoning in multimodal models through surrogate tasks, offering practical implications for data efficiency and model robustness.
Abstract
Developing reasoning capabilities in multimodal large language models (MLLMs) remains challenging. Motivated by literature suggesting that gameplay promotes transferable reasoning skills, we propose a novel post-training method, Visual Game Learning (ViGaL), where MLLMs develop generalizable reasoning skills through playing arcade-like games. Specifically, we show that training a 7B-parameter MLLM via reinforcement learning (RL) on simple games like Snake significantly enhances the downstream performance on multimodal math benchmarks like MathVista, on multi-discipline questions like MMMU and on 3D spatial reasoning benchmarks like VSI-Bench, without seeing any worked solutions, equations, or diagrams during RL. Remarkably, our model outperforms specialist models post-trained on benchmark-oriented multimodal reasoning data, while preserving the model's performance on general visual benchmarks, a challenge where specialist models often fall short. Our findings suggest that multimodal reasoning can emerge from gameplay, pointing to a promising strategy of designing surrogate tasks for RL post-training.
