Table of Contents
Fetching ...

Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning

Yuchen Yan, Hanjie Zhao, Senbin Zhu, Hongde Liu, Zhihong Zhang, Yuxiang Jia

TL;DR

This work tackles the problem of identifying both the speaker and addressee of quotations in novels by constructing JY-QuotePlus, the first Chinese corpus annotated for speaker, addressee, speaking mode, and linguistic cue. It frames speaker/addressee identification as an extractive reading comprehension task and employs prompt-learning-based fine-tuning of pre-trained transformers (T5 for English, PromptCLUE for Chinese), comparing against large language model baselines. Experiments on Chinese JY-QuotePlus and English RiQua show that fine-tuned PTMs significantly outperform zero-shot and few-shot LLMs, achieving 84.36% overall accuracy on the Chinese corpus and 69.60% on the English corpus for joint speaker/addressee identification, with detailed per-entity analyses. The work validates the approach for enhancing literary analysis of character relationships and provides publicly available data for further research in quotation processing and dialogue understanding in novels.

Abstract

Quotations in literary works, especially novels, are important to create characters, reflect character relationships, and drive plot development. Current research on quotation extraction in novels primarily focuses on quotation attribution, i.e., identifying the speaker of the quotation. However, the addressee of the quotation is also important to construct the relationship between the speaker and the addressee. To tackle the problem of dataset scarcity, we annotate the first Chinese quotation corpus with elements including speaker, addressee, speaking mode and linguistic cue. We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models. Experiments on both Chinese and English datasets show the effectiveness of the proposed methods, which outperform methods based on zero-shot and few-shot large language models.

Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning

TL;DR

This work tackles the problem of identifying both the speaker and addressee of quotations in novels by constructing JY-QuotePlus, the first Chinese corpus annotated for speaker, addressee, speaking mode, and linguistic cue. It frames speaker/addressee identification as an extractive reading comprehension task and employs prompt-learning-based fine-tuning of pre-trained transformers (T5 for English, PromptCLUE for Chinese), comparing against large language model baselines. Experiments on Chinese JY-QuotePlus and English RiQua show that fine-tuned PTMs significantly outperform zero-shot and few-shot LLMs, achieving 84.36% overall accuracy on the Chinese corpus and 69.60% on the English corpus for joint speaker/addressee identification, with detailed per-entity analyses. The work validates the approach for enhancing literary analysis of character relationships and provides publicly available data for further research in quotation processing and dialogue understanding in novels.

Abstract

Quotations in literary works, especially novels, are important to create characters, reflect character relationships, and drive plot development. Current research on quotation extraction in novels primarily focuses on quotation attribution, i.e., identifying the speaker of the quotation. However, the addressee of the quotation is also important to construct the relationship between the speaker and the addressee. To tackle the problem of dataset scarcity, we annotate the first Chinese quotation corpus with elements including speaker, addressee, speaking mode and linguistic cue. We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models. Experiments on both Chinese and English datasets show the effectiveness of the proposed methods, which outperform methods based on zero-shot and few-shot large language models.
Paper Structure (13 sections, 5 figures, 6 tables)

This paper contains 13 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Examples of Quotation Speaker and Addressee
  • Figure 2: Part of the Character Verbal Relationship Network
  • Figure 3: The Prompt Learning based Speaker and Addressee Identification Model
  • Figure 4: Designed Prompts
  • Figure 5: Few-Shot Prompts