Table of Contents
Fetching ...

ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities

Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, Liteng Gao

TL;DR

The paper tackles the challenge of evaluating multimodal large language models in art education by introducing ArtMentor, a process‑oriented HCI space that collects process data from teacher–ML interactions across nine art‑evaluation dimensions. It leverages a multi‑agent architecture (E‑Agent, R‑Agent, S‑Agent) and an HCI dataset of 380 sessions to quantify MLLM capabilities through integrated ML, NLP, and HCI metrics, while enabling iterative upgrades. Key contributions include (1) the ArtMentor space and its freely accessible dataset, (2) a holistic evaluation framework combining entity recognition, style assessment, scoring, and text generation, and (3) empirical findings on GPT‑4o’s strengths and areas for improvement in perception, understanding, and reasoning within art evaluation. The work advances a robust, process‑oriented approach to evaluating AI in education and art, with practical implications for deploying AI copilots in classroom settings and guiding future model refinements.

Abstract

Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, function as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, which rely on subjective human scoring or costly interviews, lack comprehensive coverage of various scenarios. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design to facilitate more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation while also recording interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space that integrates a dataset and three systems to optimize MLLM evaluation. The dataset consists of 380 sessions conducted by five art teachers across nine critical dimensions. The modular system includes agents for entity recognition, review generation, and suggestion generation, enabling iterative upgrades. Machine learning and natural language processing techniques ensure the reliability of evaluations. The results confirm GPT-4o's effectiveness in assisting teachers in art evaluation dialogues. Our contributions are available at https://artmentor.github.io/.

ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities

TL;DR

The paper tackles the challenge of evaluating multimodal large language models in art education by introducing ArtMentor, a process‑oriented HCI space that collects process data from teacher–ML interactions across nine art‑evaluation dimensions. It leverages a multi‑agent architecture (E‑Agent, R‑Agent, S‑Agent) and an HCI dataset of 380 sessions to quantify MLLM capabilities through integrated ML, NLP, and HCI metrics, while enabling iterative upgrades. Key contributions include (1) the ArtMentor space and its freely accessible dataset, (2) a holistic evaluation framework combining entity recognition, style assessment, scoring, and text generation, and (3) empirical findings on GPT‑4o’s strengths and areas for improvement in perception, understanding, and reasoning within art evaluation. The work advances a robust, process‑oriented approach to evaluating AI in education and art, with practical implications for deploying AI copilots in classroom settings and guiding future model refinements.

Abstract

Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, function as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, which rely on subjective human scoring or costly interviews, lack comprehensive coverage of various scenarios. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design to facilitate more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation while also recording interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space that integrates a dataset and three systems to optimize MLLM evaluation. The dataset consists of 380 sessions conducted by five art teachers across nine critical dimensions. The modular system includes agents for entity recognition, review generation, and suggestion generation, enabling iterative upgrades. Machine learning and natural language processing techniques ensure the reliability of evaluations. The results confirm GPT-4o's effectiveness in assisting teachers in art evaluation dialogues. Our contributions are available at https://artmentor.github.io/.

Paper Structure

This paper contains 57 sections, 18 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: A multi-agent data collection system from ArtMentor specifically designed to assess the GPT-4o's assistance capabilities in art evaluation. It captures interactions across 380 evaluation sessions involving five art teachers and three agents of GPT-4o.
  • Figure 2: ArtMentor Space comprises four primary components: a. Multi-Agent Data Collection System, b. HCI Dataset, c. Data Analysis System, d. Iterative Upgrades System. The Multi-Agent Data Collection System includes three agents: Entity Recognition Agent (E-Agent), Review Generation Agent (R-Agent), and Suggestion Generation Agent (S-Agent). Both R-Agent and S-Agent perform nine dimensions, such as Realism and Deformation. Additionally, we have outlined nine HCI processes (from P1 to P9), where processes initiated by the computer are marked in green and those initiated by the human are marked in orange. After data collection by the Multi-Agent system, we obtain an HCI dataset. We then apply five metrics to evaluate these four capabilities. Based on the evaluation results, we aim to iteratively upgrade capabilities that underperform in the future.
  • Figure 3: E-Agent and art teacher interaction collection.
  • Figure 4: R & S-Agent and art teacher interaction collection.
  • Figure 5: Recognition of art styles by GPT-4o across 20 artworks (Artwork Numbers 1-20).
  • ...and 7 more figures