Table of Contents
Fetching ...

From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

Haonian Ji, Shi Qiu, Siyang Xin, Siwei Han, Zhaorun Chen, Dake Zhang, Hongyi Wang, Huaxiu Yao

TL;DR

The paper identifies a critical gap in AI-generated visual pedagogy for STEM problem solving and presents EduVisBench, a multi-domain benchmark with a five-dimension rubric to evaluate visually grounded reasoning. To close the identified gaps, it introduces EduVisAgent, a modular multi-agent framework that orchestrates instructional planning, reasoning decomposition, metacognitive prompting, and visualization design to produce interactive, pedagogy-aligned visuals. Experimental results show EduVisAgent achieving an average of 81.6% on EduVisBench, a 40.2% relative improvement over the best baseline, underscoring the value of coordinated, domain-aware agent collaboration for educational visualization. This work advances the capacity to generate effective, interactive visual explanations and provides a scalable platform for evaluating visually grounded pedagogy in AI systems.

Abstract

While foundation models (FMs), such as diffusion models and large vision-language models (LVLMs), have been widely applied in educational contexts, their ability to generate pedagogically effective visual explanations remains limited. Most existing approaches focus primarily on textual reasoning, overlooking the critical role of structured and interpretable visualizations in supporting conceptual understanding. To better assess the visual reasoning capabilities of FMs in educational settings, we introduce EduVisBench, a multi-domain, multi-level benchmark. EduVisBench features diverse STEM problem sets requiring visually grounded solutions, along with a fine-grained evaluation rubric informed by pedagogical theory. Our empirical analysis reveals that existing models frequently struggle with the inherent challenge of decomposing complex reasoning and translating it into visual representations aligned with human cognitive processes. To address these limitations, we propose EduVisAgent, a multi-agent collaborative framework that coordinates specialized agents for instructional planning, reasoning decomposition, metacognitive prompting, and visualization design. Experimental results show that EduVisAgent substantially outperforms all baselines, achieving a 40.2% improvement and delivering more educationally aligned visualizations. EduVisBench and EduVisAgent are available at https://github.com/aiming-lab/EduVisBench and https://github.com/aiming-lab/EduVisAgent.

From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

TL;DR

The paper identifies a critical gap in AI-generated visual pedagogy for STEM problem solving and presents EduVisBench, a multi-domain benchmark with a five-dimension rubric to evaluate visually grounded reasoning. To close the identified gaps, it introduces EduVisAgent, a modular multi-agent framework that orchestrates instructional planning, reasoning decomposition, metacognitive prompting, and visualization design to produce interactive, pedagogy-aligned visuals. Experimental results show EduVisAgent achieving an average of 81.6% on EduVisBench, a 40.2% relative improvement over the best baseline, underscoring the value of coordinated, domain-aware agent collaboration for educational visualization. This work advances the capacity to generate effective, interactive visual explanations and provides a scalable platform for evaluating visually grounded pedagogy in AI systems.

Abstract

While foundation models (FMs), such as diffusion models and large vision-language models (LVLMs), have been widely applied in educational contexts, their ability to generate pedagogically effective visual explanations remains limited. Most existing approaches focus primarily on textual reasoning, overlooking the critical role of structured and interpretable visualizations in supporting conceptual understanding. To better assess the visual reasoning capabilities of FMs in educational settings, we introduce EduVisBench, a multi-domain, multi-level benchmark. EduVisBench features diverse STEM problem sets requiring visually grounded solutions, along with a fine-grained evaluation rubric informed by pedagogical theory. Our empirical analysis reveals that existing models frequently struggle with the inherent challenge of decomposing complex reasoning and translating it into visual representations aligned with human cognitive processes. To address these limitations, we propose EduVisAgent, a multi-agent collaborative framework that coordinates specialized agents for instructional planning, reasoning decomposition, metacognitive prompting, and visualization design. Experimental results show that EduVisAgent substantially outperforms all baselines, achieving a 40.2% improvement and delivering more educationally aligned visualizations. EduVisBench and EduVisAgent are available at https://github.com/aiming-lab/EduVisBench and https://github.com/aiming-lab/EduVisAgent.

Paper Structure

This paper contains 27 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: GPT-4o fails to illustrate its problem-solving with high-quality, logical, and explanatory visualization.
  • Figure 2: Dataset distribution of EduVisBench. Each domain encompasses various sub-domains, collectively covering 15 comprehensive pedagogical scenarios.
  • Figure 3: Representative examples from EduVisBench, featuring questions from Maths, Chemistry, and Physics alongside their corresponding high-scoring visual explanations. These interactive visualizations, generated by our multi-agent system EduVisAgent, exemplify well-designed, pedagogically effective outputs for STEM problems.
  • Figure 4: Workflow for the EduVisBench benchmark evaluation. Models receive a visualization prompt and a question to generate visual outputs. All resulting visualizations undergo evaluation by GPT-4o across five dimensions to compute a total performance score.
  • Figure 5: The structure of EduVisAgent.
  • ...and 2 more figures