Table of Contents
Fetching ...

CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements

Afshin Khadangi, Amir Sartipi, Igor Tchappi, Gilbert Fridgen

TL;DR

The paper tackles automating formal art analysis by applying multimodal large language models to decode technical and expressive elements of artworks. It introduces a pipeline that combines GPT-4V, Gemini 2.0, and GPT-4 to analyze over 15,000 works from 23 artists across 34 styles, guided by an eight-question criteria set. An embedding-based evaluation against ground-truth style descriptions uses four models to quantify how well automated analyses align with stylistic descriptors. The results reveal consistent patterns in form, color, light, movement, and technique over time, and they demonstrate the scalability and potential of AI-assisted art analysis for historians, educators, and enthusiasts.

Abstract

Art, as a universal language, can be interpreted in diverse ways, with artworks embodying profound meanings and nuances. The advent of Large Language Models (LLMs) and the availability of Multimodal Large Language Models (MLLMs) raise the question of how these transformative models can be used to assess and interpret the artistic elements of artworks. While research has been conducted in this domain, to the best of our knowledge, a deep and detailed understanding of the technical and expressive features of artworks using LLMs has not been explored. In this study, we investigate the automation of a formal art analysis framework to analyze a high-throughput number of artworks rapidly and examine how their patterns evolve over time. We explore how LLMs can decode artistic expressions, visual elements, composition, and techniques, revealing emerging patterns that develop across periods. Finally, we discuss the strengths and limitations of LLMs in this context, emphasizing their ability to process vast quantities of art-related data and generate insightful interpretations. Due to the exhaustive and granular nature of the results, we have developed interactive data visualizations, available online https://cognartive.github.io/, to enhance understanding and accessibility.

CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements

TL;DR

The paper tackles automating formal art analysis by applying multimodal large language models to decode technical and expressive elements of artworks. It introduces a pipeline that combines GPT-4V, Gemini 2.0, and GPT-4 to analyze over 15,000 works from 23 artists across 34 styles, guided by an eight-question criteria set. An embedding-based evaluation against ground-truth style descriptions uses four models to quantify how well automated analyses align with stylistic descriptors. The results reveal consistent patterns in form, color, light, movement, and technique over time, and they demonstrate the scalability and potential of AI-assisted art analysis for historians, educators, and enthusiasts.

Abstract

Art, as a universal language, can be interpreted in diverse ways, with artworks embodying profound meanings and nuances. The advent of Large Language Models (LLMs) and the availability of Multimodal Large Language Models (MLLMs) raise the question of how these transformative models can be used to assess and interpret the artistic elements of artworks. While research has been conducted in this domain, to the best of our knowledge, a deep and detailed understanding of the technical and expressive features of artworks using LLMs has not been explored. In this study, we investigate the automation of a formal art analysis framework to analyze a high-throughput number of artworks rapidly and examine how their patterns evolve over time. We explore how LLMs can decode artistic expressions, visual elements, composition, and techniques, revealing emerging patterns that develop across periods. Finally, we discuss the strengths and limitations of LLMs in this context, emphasizing their ability to process vast quantities of art-related data and generate insightful interpretations. Due to the exhaustive and granular nature of the results, we have developed interactive data visualizations, available online https://cognartive.github.io/, to enhance understanding and accessibility.

Paper Structure

This paper contains 15 sections, 23 figures, 1 table.

Figures (23)

  • Figure 1: Distribution of the number of artworks across different styles for individual artists. We retrieved more than 15,000 artworks across 23 artists for our study.
  • Figure 2: Illustration of our analysis framework for decoding aesthetics, which integrates both technical and expressive features of digitized artworks. The analysis process begins by submitting the artwork image along with eight predefined technical and conceptual questions to the GPT-4V API. The responses are subsequently processed by GPT-4 and Gemini 2.0 to extract and synthesize insights using both qualitative and quantitative art metrics. To evaluate the results, we compute the cosine similarity score between the text embeddings of art styles and the synthesized analysis results, leveraging four embedding models: SBERT, BGE-m3, OpenAI, and NVIDIA's NV-Embed-v2.
  • Figure 3: Demonstration of the technical and expressive range of questions we employed in our methodology to analyze the artworks, as incorporated into the API request prompts. Questions 1-7 hodge2024elements had been sourced carefully to guarantee that our analysis framework comply with the art assessment guidelines and expertise.
  • Figure S1: Distribution of form types in artworks (Natural, Geometric, Regular, and Irregular) across different years. The chart shows the cumulative number of artworks for each form type.
  • Figure S2: Distribution of scale types in artworks (Realistic, Oversized/Large, and Reduced/Small) across different years. The chart shows the cumulative number of artworks for each scale type.
  • ...and 18 more figures