Mapping the Past: Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata
Axel Ahlin, Alfred Myrne, Pierre Nugues
TL;DR
The paper addresses extracting and geographically linking location entries from the early 20th-century Swedish encyclopedia Nordisk Familjebok (Uggleupplagan) to Wikidata. It combines a semi-automatic pipeline using a fine-tuned KB-BERT classifier for geographic entry detection with SBERT-based candidate ranking to map entries to Wikidata items and retrieve coordinates, enabling a historical GIS view of the encyclopedia’s geographic coverage. The study outputs about 28,284 location candidates and 17,793 coordinates, revealing concentration in Sweden, Germany, and the United Kingdom, and highlights methodological challenges in disambiguation and historical data. The work enables cross-time, cross-encyclopedia comparative analyses and provides a dataset and workflow that support future investigations into historical information representation and selection biases.
Abstract
In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the \textit{Nordisk Familjebok} `Nordic Family Book.' We focused on the second edition called \textit{Uggleupplagan}, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the entry selection process. It showed a higher density within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the \textit{Nordisk Familjebok}, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias.
