Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

Jingqi Tong; Jixin Tang; Hangcheng Li; Yurong Mou; Ming Zhang; Jun Zhao; Yanbo Wen; Fan Song; Jiahao Zhan; Yuyang Lu; Chaoran Tao; Zhiyuan Guo; Jizhou Yu; Tianhao Cheng; Zhiheng Xi; Changhao Jiang; Zhangyue Yin; Yining Zheng; Weifeng Ge; Guanhua Chen; Tao Gui; Xipeng Qiu; Qi Zhang; Xuanjing Huang

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

Jingqi Tong, Jixin Tang, Hangcheng Li, Yurong Mou, Ming Zhang, Jun Zhao, Yanbo Wen, Fan Song, Jiahao Zhan, Yuyang Lu, Chaoran Tao, Zhiyuan Guo, Jizhou Yu, Tianhao Cheng, Zhiheng Xi, Changhao Jiang, Zhangyue Yin, Yining Zheng, Weifeng Ge, Guanhua Chen, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

TL;DR

This paper introduces Game-RL, a framework that uses synthesizable, verifiable game data to improve Vision-Language Models' general reasoning through reinforcement learning. A novel Code2Logic pipeline converts game code into a large, multimodal reasoning dataset called GameQA, which spans 30 games, 158 tasks, and about 140K questions. RL trained solely on GameQA with GRPO yields improvements across seven vision-language benchmarks and demonstrates noteworthy out-of-domain generalization, suggesting video game environments are a practical resource for broad reasoning capability. The work also reports on data quality, scaling and diversity effects, and confirms the remaining gap between current models and human performance on the GameQA benchmark.

Abstract

Vision-language reinforcement learning (RL) has primarily focused on narrow domains (e.g. geometry or chart reasoning). This leaves broader training scenarios and resources underexplored, limiting the exploration and learning of Vision Language Models (VLMs) through RL. We find video games inherently provide rich visual elements and mechanics that are easy to verify. To fully use the multimodal and verifiable reward in video games, we propose Game-RL, constructing diverse game tasks for RL training to boost VLMs general reasoning ability. To obtain training data, we propose Code2Logic, a novel approach that adapts game code to synthesize game reasoning task data, thus obtaining the GameQA dataset of 30 games and 158 tasks with controllable difficulty gradation. Unexpectedly, RL training solely on GameQA enables multiple VLMs to achieve performance improvements across 7 diverse vision-language benchmarks, demonstrating the value of Game-RL for enhancing VLMs' general reasoning. Furthermore, this suggests that video games may serve as valuable scenarios and resources to boost general reasoning abilities. Our code, dataset and models are available at the GitHub repository.

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

TL;DR

Abstract

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (24)