Table of Contents
Fetching ...

Level of agreement between emotions generated by Artificial Intelligence and human evaluation: a methodological proposal

Miguel Carrasco, Cesar Gonzalez-Martin, Sonia Navajas-Torrente, Raul Dastres

TL;DR

A generally good level of agreement was found between participants between the observers’ responses and the generated emotions by AI, which confirms the subjectivity inherent in emotional evaluation.

Abstract

Images are capable of conveying emotions, but emotional experience is highly subjective. Advances in artificial intelligence have enabled the generation of images based on emotional descriptions. However, the level of agreement between the generative images and human emotional responses has not yet been evaluated. To address this, 20 artistic landscapes were generated using StyleGAN2-ADA. Four variants evoking positive emotions (contentment, amusement) and negative emotions (fear, sadness) were created for each image, resulting in 80 pictures. An online questionnaire was designed using this material, in which 61 observers classified the generated images. Statistical analyses were performed on the collected data to determine the level of agreement among participants, between the observer's responses, and the AI-generated emotions. A generally good level of agreement was found, with better results for negative emotions. However, the study confirms the subjectivity inherent in emotional evaluation.

Level of agreement between emotions generated by Artificial Intelligence and human evaluation: a methodological proposal

TL;DR

A generally good level of agreement was found between participants between the observers’ responses and the generated emotions by AI, which confirms the subjectivity inherent in emotional evaluation.

Abstract

Images are capable of conveying emotions, but emotional experience is highly subjective. Advances in artificial intelligence have enabled the generation of images based on emotional descriptions. However, the level of agreement between the generative images and human emotional responses has not yet been evaluated. To address this, 20 artistic landscapes were generated using StyleGAN2-ADA. Four variants evoking positive emotions (contentment, amusement) and negative emotions (fear, sadness) were created for each image, resulting in 80 pictures. An online questionnaire was designed using this material, in which 61 observers classified the generated images. Statistical analyses were performed on the collected data to determine the level of agreement among participants, between the observer's responses, and the AI-generated emotions. A generally good level of agreement was found, with better results for negative emotions. However, the study confirms the subjectivity inherent in emotional evaluation.

Paper Structure

This paper contains 21 sections, 13 figures, 11 tables.

Figures (13)

  • Figure 1: General scheme of the evaluation process of emotions generated by a generative neural. The method comprises three stages: data preparation, modelling and evaluation.
  • Figure 2: Proposed methodology for emotion evaluation generated by a generative network. Within each stage, there are multiple sub-stages dedicated to image development and evaluation.
  • Figure 3: Examples of artistic works generated by the StyleGAN2 ADA tool are based on a landscape dataset with four emotional categories. All images are completely new, and there are no existing similar ones in the training set.
  • Figure 4: Sociodemographic data of study participants: boxplot age, gender male, female, country, area of study, highest level of study obtained. More information about the groupings used in the study will be reviewed in the results section.
  • Figure 5: Evaluation process and agreement between mode and the StyleGAN2 ADA tool. Each votes on each of the images. Then the mode is calculated for each image to obtain the representative emotion of each image which is compared with the emotional label generated by the generative tool.
  • ...and 8 more figures