Table of Contents
Fetching ...

Curating art exhibitions using machine learning

Eurico Covas

TL;DR

The paper tackles the problem of curating art exhibitions by teaching AI to imitate human curators using a Met Museum–based dataset of past exhibitions. It compares four modeling approaches, ranging from self-contained text statistics to OpenAI embeddings and a GPT‑4o-mini fine-tuned system, all trained on ($x$, $y$) pairs where $x$ is the exhibition text and $y$ encodes artworks metadata. Across 80/20 train/validation splits, embedding-based methods—especially the full OpenAI GPT‑style mapping—show the strongest signal, achieving high generalised-tag accuracy for several metadata fields and improved, though imperfect, artwork hit rates. The results demonstrate that modest-size models can approach, and in some metrics exceed, random baselines and that richer embeddings and fine-tuning yield meaningful improvements, with potential applications in AI-assisted or virtual exhibition curation. Limitations include hallucinations from large language models and data constraints, underscoring the need for robust validation when deploying in real galleries.

Abstract

Here we present a series of artificial models - a total of four related models - based on machine learning techniques that attempt to learn from existing exhibitions which have been curated by human experts, in order to be able to do similar curatorship work. Out of our four artificial intelligence models, three achieve a reasonable ability at imitating these various curators responsible for all those exhibitions, with various degrees of precision and curatorial coherence. In particular, we can conclude two key insights: first, that there is sufficient information in these exhibitions to construct an artificial intelligence model that replicates past exhibitions with an accuracy well above random choices; and second, that using feature engineering and carefully designing the architecture of modest size models can make them almost as good as those using the so-called large language models such as GPT in a brute force approach.

Curating art exhibitions using machine learning

TL;DR

The paper tackles the problem of curating art exhibitions by teaching AI to imitate human curators using a Met Museum–based dataset of past exhibitions. It compares four modeling approaches, ranging from self-contained text statistics to OpenAI embeddings and a GPT‑4o-mini fine-tuned system, all trained on (, ) pairs where is the exhibition text and encodes artworks metadata. Across 80/20 train/validation splits, embedding-based methods—especially the full OpenAI GPT‑style mapping—show the strongest signal, achieving high generalised-tag accuracy for several metadata fields and improved, though imperfect, artwork hit rates. The results demonstrate that modest-size models can approach, and in some metrics exceed, random baselines and that richer embeddings and fine-tuning yield meaningful improvements, with potential applications in AI-assisted or virtual exhibition curation. Limitations include hallucinations from large language models and data constraints, underscoring the need for robust validation when deploying in real galleries.

Abstract

Here we present a series of artificial models - a total of four related models - based on machine learning techniques that attempt to learn from existing exhibitions which have been curated by human experts, in order to be able to do similar curatorship work. Out of our four artificial intelligence models, three achieve a reasonable ability at imitating these various curators responsible for all those exhibitions, with various degrees of precision and curatorial coherence. In particular, we can conclude two key insights: first, that there is sufficient information in these exhibitions to construct an artificial intelligence model that replicates past exhibitions with an accuracy well above random choices; and second, that using feature engineering and carefully designing the architecture of modest size models can make them almost as good as those using the so-called large language models such as GPT in a brute force approach.

Paper Structure

This paper contains 9 sections, 2 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: Example of a toy model neural network mapping exhibition data, in this case the title and description (overview text) of the exhibition to the generalised tags or a list of artwork suggestions to assign to the input exhibition. Each node or neuron on the neural network represents a real number, the level of the neuron activity, and the links between the nodes are called weights and represent the influence of that neuron on other neurons.
  • Figure 2: Neural network approach using self-contained text vectorisation and the (fixed-length) probability of finding a value on the exhibitions metadata fields. From left to right, the neural network takes a variable length plain text, i.e., the title and description of the exhibition, then it creates a vectorisation, which outputs a numeric vector, an embedding of that text. That vector has 256 dimensions, or numbers. Then it passes that vector to a one-dimensional average pooling layer, which averages those 256-dimensional vectors to smaller 64-dimensional vectors. That is passed to a so-called hidden layer, a dense layer of 256 nodes that has an activation function (in this case a ReLU function) -- the ReLU (Rectified Linear Unit) function is defined as $\text{ReLU}(x) = \max(0, x)$, and outputs simply the input if it is positive, and zero otherwise. Finally, a second hidden layer with 8615 nodes or neurons and a linear activation function (in this case identity function) maps the results to the probabilities. In the diagram, $y_{\text{dim}}$ is the number of distinct values of the exhibitions' artworks metadata fields (e.g., ["European Sculpture and Decorative Arts", "The American Wing", "Diego de Pesquera", "1585", "Sculpture", …]). Therefore, $y_{\text{dim}}=8615$. The final outputs are probabilities of each one of those metadata fields (or what we call generalised tags) to show up in the list of artworks associated with that input title and description. We tested several activation layer functions, which decide the type of output. We also played with the number of hidden layer nodes, and settled on 256 nodes. The network/training parameters were: number of epochs = 2048, with the text vectorisation using max_tokens = 32768, output_sequence_length = 256, output_mode = "int", and standardize = "lower_and_strip_punctuation". For all runs we used batch_size = 16.
  • Figure 3: Mean squared error (MSE) plotted for the training and validation sets of the neural network in Figure \ref{['self_contained_text_vectorisation_neural_network']}. The inset shows the same plot with a logarithmic y-axis.
  • Figure 4: Percentage of generalised tags intersection on the validation set, between actual and prediction by the model.
  • Figure 5: Percentage intersection of predicted artworks that match actual artworks on the validation set.
  • ...and 16 more figures