For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

Lia Morra; Antonio Santangelo; Pietro Basci; Luca Piano; Fabio Garcea; Fabrizio Lamberti; Massimo Leone

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, Massimo Leone

TL;DR

The paper introduces FRESCO, a framework that bridges visual semiotics and computer vision to analyze large-scale facial image archives from social media. It operationalizes the semiotics triad of plastic, figurative, and enunciation into quantitative traits derived from state-of-the-art CV models, and defines the FRESCO-Score as an interpretable, multi-level similarity metric. The approach is validated on public datasets (FFHQ-in-the-wild and MIAP OpenImages) to assess both the accuracy of extracted quantities and the usefulness of the score for content-based retrieval and sociocultural interpretation. The authors discuss limitations, such as model dependencies and out-of-distribution concerns, and outline future directions including integration with external knowledge and broader image archives to enhance robustness and applicability.

Abstract

Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework's output that serves as a reliable measure of similarity in image content.

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

TL;DR

Abstract

Paper Structure (24 sections, 4 equations, 13 figures, 4 tables)

This paper contains 24 sections, 4 equations, 13 figures, 4 tables.

Introduction
Related work
Inferring personality from social media
Computational analysis in media and art history
Semiotics and computational analysis
Background
The FRESCO architecture
Conceptual design
Implementation
Built-in models
Structured data extraction
The FRESCO Similarity Score
Image-level measures
Mapping strategy
Instance-level measures
...and 9 more sections

Figures (13)

Figure 1: The FRESCO (Face Representation in E-Societies through Computational Observation) pipeline extracts quantifiable traits from images using SOTA computer vision and deep learning tools. The traits are not limited to facial and body characteristics, but encompass interaction with the context and background, the presence of textual elements, and so forth. Such traits are categorized according to their plastic (color, forms), figurative (objects and actions) and enunciative (gazes and mutual placements) categories or traits, based on principles from structural visual semiotics.
Figure 2: The profile of a mountain climber
Figure 3: Picture of a man looking towards the clouds beneath him
Figure 4: A set of images with similar meaning
Figure 5: We compute position of the centroid of each identified object or person with respect to the vertical and horizontal midlines, as well as the distance from the image center, to determine the position of each object or person with respect to the image frame. All positions are rescaled between 0 and 1 and thus are independent from image size.
...and 8 more figures

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

TL;DR

Abstract

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

Authors

TL;DR

Abstract

Table of Contents

Figures (13)