Table of Contents
Fetching ...

Level Up Your Tutorials: VLMs for Game Tutorials Quality Assessment

Daniele Rege Cambrin, Gabriele Scaffidi Militone, Luca Colomba, Giovanni Malnati, Daniele Apiletti, Paolo Garza

TL;DR

This work tackles the costly process of validating game tutorials by introducing a VLM-driven framework that treats tutorial frames as test cases. By extracting frames, annotating them with developer-defined QA pairs, and comparing VLM outputs against ground-truth answers, the approach quantifies tutorial clarity. The study benchmarks multiple models (GPT-4o, InternVL, DragonFly) on two tutorial versions (P and L) using ROUGE and BERT-Score, showing improvements in the latest version and offering scalable, automated feedback to developers. The results support the viability of automatic tutorial quality assessment and point to future work aligning VLM judgments with player perception and broadening the scope to other quality faults.

Abstract

Designing effective game tutorials is crucial for a smooth learning curve for new players, especially in games with many rules and complex core mechanics. Evaluating the effectiveness of these tutorials usually requires multiple iterations with testers who have no prior knowledge of the game. Recent Vision-Language Models (VLMs) have demonstrated significant capabilities in understanding and interpreting visual content. VLMs can analyze images, provide detailed insights, and answer questions about their content. They can recognize objects, actions, and contexts in visual data, making them valuable tools for various applications, including automated game testing. In this work, we propose an automated game-testing solution to evaluate the quality of game tutorials. Our approach leverages VLMs to analyze frames from video game tutorials, answer relevant questions to simulate human perception, and provide feedback. This feedback is compared with expected results to identify confusing or problematic scenes and highlight potential errors for developers. In addition, we publish complete tutorial videos and annotated frames from different game versions used in our tests. This solution reduces the need for extensive manual testing, especially by speeding up and simplifying the initial development stages of the tutorial to improve the final game experience.

Level Up Your Tutorials: VLMs for Game Tutorials Quality Assessment

TL;DR

This work tackles the costly process of validating game tutorials by introducing a VLM-driven framework that treats tutorial frames as test cases. By extracting frames, annotating them with developer-defined QA pairs, and comparing VLM outputs against ground-truth answers, the approach quantifies tutorial clarity. The study benchmarks multiple models (GPT-4o, InternVL, DragonFly) on two tutorial versions (P and L) using ROUGE and BERT-Score, showing improvements in the latest version and offering scalable, automated feedback to developers. The results support the viability of automatic tutorial quality assessment and point to future work aligning VLM judgments with player perception and broadening the scope to other quality faults.

Abstract

Designing effective game tutorials is crucial for a smooth learning curve for new players, especially in games with many rules and complex core mechanics. Evaluating the effectiveness of these tutorials usually requires multiple iterations with testers who have no prior knowledge of the game. Recent Vision-Language Models (VLMs) have demonstrated significant capabilities in understanding and interpreting visual content. VLMs can analyze images, provide detailed insights, and answer questions about their content. They can recognize objects, actions, and contexts in visual data, making them valuable tools for various applications, including automated game testing. In this work, we propose an automated game-testing solution to evaluate the quality of game tutorials. Our approach leverages VLMs to analyze frames from video game tutorials, answer relevant questions to simulate human perception, and provide feedback. This feedback is compared with expected results to identify confusing or problematic scenes and highlight potential errors for developers. In addition, we publish complete tutorial videos and annotated frames from different game versions used in our tests. This solution reduces the need for extensive manual testing, especially by speeding up and simplifying the initial development stages of the tutorial to improve the final game experience.
Paper Structure (23 sections, 7 figures, 3 tables)

This paper contains 23 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Proposed framework. VLM is asked to answer questions about tutorial frames. Actual answers by the VLM are then compared with the expected answers provided by the developers to provide a quality score. The score informs about possible areas of improvement that can be used to improve the final User Interaction (UI).
  • Figure 2: Example of the two modes of the video game tutorial.
  • Figure 3: Example annotation. For each frame, we provide a list of questions and answers related to the frame.
  • Figure 4: Matching of the proposed method with AAA pattern. The Arrange defines the question-frame couple to make a test on. The Act produces the Actual Answer given the text-image pair using a VLM. The Assert computes a score evaluating the Expected against the Actual Answer. This score is then compared to Lower and High Thresholds (LT, HT) to indicate whether the test is passed, requires revision, or failed.
  • Figure 5: Mean Spearman Correlation across metrics by model. DF is DragonFly, and VX indicates the InternVL version and eventually the size.
  • ...and 2 more figures