Table of Contents
Fetching ...

Mining Asymmetric Intertextuality

Pak Kin Lau, Stuart Michael McManus

TL;DR

This paper proposes a scalable and adaptive approach for mining asymmetric intertextuality, leveraging a split-normalize-merge paradigm, which is particularly well-suited for dynamically growing corpora, such as expanding literary archives or historical databases.

Abstract

This paper introduces a new task in Natural Language Processing (NLP) and Digital Humanities (DH): Mining Asymmetric Intertextuality. Asymmetric intertextuality refers to one-sided relationships between texts, where one text cites, quotes, or borrows from another without reciprocation. These relationships are common in literature and historical texts, where a later work references aclassical or older text that remain static. We propose a scalable and adaptive approach for mining asymmetric intertextuality, leveraging a split-normalize-merge paradigm. In this approach, documents are split into smaller chunks, normalized into structured data using LLM-assisted metadata extraction, and merged during querying to detect both explicit and implicit intertextual relationships. Our system handles intertextuality at various levels, from direct quotations to paraphrasing and cross-document influence, using a combination of metadata filtering, vector similarity search, and LLM-based verification. This method is particularly well-suited for dynamically growing corpora, such as expanding literary archives or historical databases. By enabling the continuous integration of new documents, the system can scale efficiently, making it highly valuable for digital humanities practitioners in literacy studies, historical research and related fields.

Mining Asymmetric Intertextuality

TL;DR

This paper proposes a scalable and adaptive approach for mining asymmetric intertextuality, leveraging a split-normalize-merge paradigm, which is particularly well-suited for dynamically growing corpora, such as expanding literary archives or historical databases.

Abstract

This paper introduces a new task in Natural Language Processing (NLP) and Digital Humanities (DH): Mining Asymmetric Intertextuality. Asymmetric intertextuality refers to one-sided relationships between texts, where one text cites, quotes, or borrows from another without reciprocation. These relationships are common in literature and historical texts, where a later work references aclassical or older text that remain static. We propose a scalable and adaptive approach for mining asymmetric intertextuality, leveraging a split-normalize-merge paradigm. In this approach, documents are split into smaller chunks, normalized into structured data using LLM-assisted metadata extraction, and merged during querying to detect both explicit and implicit intertextual relationships. Our system handles intertextuality at various levels, from direct quotations to paraphrasing and cross-document influence, using a combination of metadata filtering, vector similarity search, and LLM-based verification. This method is particularly well-suited for dynamically growing corpora, such as expanding literary archives or historical databases. By enabling the continuous integration of new documents, the system can scale efficiently, making it highly valuable for digital humanities practitioners in literacy studies, historical research and related fields.

Paper Structure

This paper contains 55 sections, 3 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Types of intertextuality. Where Quotation and Citation (colored in green) has obvious cues.
  • Figure 2: Comparison of Symmetric Intertextual Link Mining (left) and Asymmetric Intertextual Link Mining (right). Both problems share common steps such as candidate subsetting in vector space and post-processing. However, the asymmetric approach introduces innovative steps (blue nodes): Metadata-Enriched Hierarchical Chunking, Hybrid Search with Metadata Filtering, and LLM-Based Verification, which enhance the accuracy of intertextual link mining.