Table of Contents
Fetching ...

How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?

Pritam Sil, Durgaprasad Karnam, Vinay Reddy Venumuddala, Pushpak Bhattacharyya

TL;DR

This work proposes MMGrader, an approach that infers the quality of students'mental models from their multimodal responses using concept graphs as an analytical framework, and finds that the best-performing models fall short of human-level performance.

Abstract

STEM Mental models can play a critical role in assessing students' conceptual understanding of a topic. They not only offer insights into what students know but also into how effectively they can apply, relate to, and integrate concepts across various contexts. Thus, students' responses are critical markers of the quality of their understanding and not entities that should be merely graded. However, inferring these mental models from student answers is challenging as it requires deep reasoning skills. We propose MMGrader, an approach that infers the quality of students' mental models from their multimodal responses using concept graphs as an analytical framework. In our evaluation with 9 openly available models, we found that the best-performing models fall short of human-level performance. This is because they only achieved an accuracy of approximately 40%, a prediction error of 1.1 units, and a scoring distribution fairly aligned with human scoring patterns. With improved accuracy, these can be highly effective assistants to teachers in inferring the mental models of their entire classrooms, enabling them to do so efficiently and help improve their pedagogies more effectively by designing targeted help sessions and lectures that strengthen areas where students collectively demonstrate lower proficiency.

How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?

TL;DR

This work proposes MMGrader, an approach that infers the quality of students'mental models from their multimodal responses using concept graphs as an analytical framework, and finds that the best-performing models fall short of human-level performance.

Abstract

STEM Mental models can play a critical role in assessing students' conceptual understanding of a topic. They not only offer insights into what students know but also into how effectively they can apply, relate to, and integrate concepts across various contexts. Thus, students' responses are critical markers of the quality of their understanding and not entities that should be merely graded. However, inferring these mental models from student answers is challenging as it requires deep reasoning skills. We propose MMGrader, an approach that infers the quality of students' mental models from their multimodal responses using concept graphs as an analytical framework. In our evaluation with 9 openly available models, we found that the best-performing models fall short of human-level performance. This is because they only achieved an accuracy of approximately 40%, a prediction error of 1.1 units, and a scoring distribution fairly aligned with human scoring patterns. With improved accuracy, these can be highly effective assistants to teachers in inferring the mental models of their entire classrooms, enabling them to do so efficiently and help improve their pedagogies more effectively by designing targeted help sessions and lectures that strengthen areas where students collectively demonstrate lower proficiency.
Paper Structure (14 sections, 5 figures, 3 tables)

This paper contains 14 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Concept hierarchy for the topic of vectors
  • Figure 2: A concept graph for addition of vectors
  • Figure 3: Overview of MMGrader
  • Figure 4: Sample from our dataset
  • Figure 5: Image provided as part of one of the questions in the dataset.