Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs

Joonatan Laato; Jenna Kanerva; John Loehr; Virpi Lummaa; Filip Ginter

Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs

Joonatan Laato, Jenna Kanerva, John Loehr, Virpi Lummaa, Filip Ginter

TL;DR

This paper investigates zero-shot information extraction of social organizations and hobbies from a vast corpus of Finnish Karelia refugee interviews, comparing GPT-4, open LLMs, and a FinBERT-based supervised approach. Through careful prompt engineering, batching analyses, and language choices, the authors show GPT-4 achieves an F1 around 88–89%, while a strong open model (Llama-3-70B-Instruct) nearly matches at ~87–88%, and FinBERT reaches mid-80s when trained on GPT-4-derived data. The study also provides a full-data extraction, revealing hundreds of thousands of hobby and organization mentions, and analyzes energy costs and scalability. Collectively, the results highlight the viability of open and hybrid approaches for large-scale information extraction in non-English historical corpora, with practical implications for digital humanities and migration research.

Abstract

We performed a zero-shot information extraction study on a historical collection of 89,339 brief Finnish-language interviews of refugee families relocated post-WWII from Finnish Eastern Karelia. Our research objective is two-fold. First, we aim to extract social organizations and hobbies from the free text of the interviews, separately for each family member. These can act as a proxy variable indicating the degree of social integration of refugees in their new environment. Second, we aim to evaluate several alternative ways to approach this task, comparing a number of generative models and a supervised learning approach, to gain a broader insight into the relative merits of these different approaches and their applicability in similar studies. We find that the best generative model (GPT-4) is roughly on par with human performance, at an F-score of 88.8%. Interestingly, the best open generative model (Llama-3-70B-Instruct) reaches almost the same performance, at 87.7% F-score, demonstrating that open models are becoming a viable alternative for some practical tasks even on non-English data. Additionally, we test a supervised learning alternative, where we fine-tune a Finnish BERT model (FinBERT) using GPT-4 generated training data. By this method, we achieved an F-score of 84.1% already with 6K interviews up to an F-score of 86.3% with 30k interviews. Such an approach would be particularly appealing in cases where the computational resources are limited, or there is a substantial mass of data to process.

Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs

TL;DR

Abstract

Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)