Table of Contents
Fetching ...

The art of connections: constructing a social network from the correspondence archive of Sybren Valkema

Vera Provatorova, Carlotta Capurro, Evangelos Kanoulas

Abstract

Social network analysis allows researchers to discover insights from connections between people. While the process of building a social network is relatively straightforward for contemporary social media, deriving connections from historical archives remains a challenging task, with every data collection presenting its unique challenges. Our contribution focuses on building and analysing a social network from the correspondence archive of Sybren Valkema (1916-1996), a Dutch glass artist and educator. The archive contains both typewritten and handwritten documents in multiple languages, and includes letters from glass artists, art students, art collectors and other agents. We develop an automatic pipeline approach which includes separating handwritten and typed documents, performing text recognition specific to the document modality, extracting names of people from text using named entity recognition, de-duplicating the resulting names to create actor nodes, classifying the actors using entity linking, and, finally, connecting them together and analysing the resulting network. Every part of the pipeline is evaluated against a manual analysis performed by an art historian on a subset of the data collection in order to find out which pitfalls of the automatic approach need to be resolved in future work and, on the contrary, whether using the automatic approach allows to discover any additional insights. The results show strong performance in discovering sender-receiver connections as well as additional meaningful connections in text, with the main challenge being text recognition on scanned pages.

The art of connections: constructing a social network from the correspondence archive of Sybren Valkema

Abstract

Social network analysis allows researchers to discover insights from connections between people. While the process of building a social network is relatively straightforward for contemporary social media, deriving connections from historical archives remains a challenging task, with every data collection presenting its unique challenges. Our contribution focuses on building and analysing a social network from the correspondence archive of Sybren Valkema (1916-1996), a Dutch glass artist and educator. The archive contains both typewritten and handwritten documents in multiple languages, and includes letters from glass artists, art students, art collectors and other agents. We develop an automatic pipeline approach which includes separating handwritten and typed documents, performing text recognition specific to the document modality, extracting names of people from text using named entity recognition, de-duplicating the resulting names to create actor nodes, classifying the actors using entity linking, and, finally, connecting them together and analysing the resulting network. Every part of the pipeline is evaluated against a manual analysis performed by an art historian on a subset of the data collection in order to find out which pitfalls of the automatic approach need to be resolved in future work and, on the contrary, whether using the automatic approach allows to discover any additional insights. The results show strong performance in discovering sender-receiver connections as well as additional meaningful connections in text, with the main challenge being text recognition on scanned pages.

Paper Structure

This paper contains 19 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Example of pairwise string similarity scores calculated for record linkage (Step 3 of the pipeline).
  • Figure 2: Visualisation of the three networks constructed from the correspondence archive. Node colours reflect countries of residence retrieved for the actors: orange is the Netherlands, blue is the USA and all remaining nodes are purple.
  • Figure 3: Results of manually evaluating the pipeline on a sample of data.
  • Figure 4: Top-10 most frequent unique entities in the corpus after 3 steps of the pipeline: NER, record linkage and entity linking.
  • Figure 5: Centrality profiles of the top-10 most prominent nodes in the three networks.