Table of Contents
Fetching ...

Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling

Gregorios A Katsios, Ning Sa, Tomek Strzalkowski

TL;DR

The paper investigates whether a single multi-task classifier that detects multiple figurative language forms can outperform separate binary detectors and whether these FL-derived features can improve authorship attribution. It builds MFLM on RoBERTa-Large by first training task-specific binary FL detectors across 13 datasets, then creating a multi-label FL training set for a unified model. Empirical results show MFLM matches or exceeds binary detectors on several FL tasks and, when embeddings are combined with stylometric and n-gram features, yields improvements in three AA datasets, sometimes rivaling or surpassing SOTA baselines. The work demonstrates that FL usage encodes personalized author signals and encourages broader integration of figurative features into stylometric analysis for more robust authorship attribution.

Abstract

The identification of Figurative Language (FL) features in text is crucial for various Natural Language Processing (NLP) tasks, where understanding of the author's intended meaning and its nuances is key for successful communication. At the same time, the use of a specific blend of various FL forms most accurately reflects a writer's style, rather than the use of any single construct, such as just metaphors or irony. Thus, we postulate that FL features could play an important role in Authorship Attribution (AA) tasks. We believe that our is the first computational study of AA based on FL use. Accordingly, we propose a Multi-task Figurative Language Model (MFLM) that learns to detect multiple FL features in text at once. We demonstrate, through detailed evaluation across multiple test sets, that the our model tends to perform equally or outperform specialized binary models in FL detection. Subsequently, we evaluate the predictive capability of joint FL features towards the AA task on three datasets, observing improved AA performance through the integration of MFLM embeddings.

Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling

TL;DR

The paper investigates whether a single multi-task classifier that detects multiple figurative language forms can outperform separate binary detectors and whether these FL-derived features can improve authorship attribution. It builds MFLM on RoBERTa-Large by first training task-specific binary FL detectors across 13 datasets, then creating a multi-label FL training set for a unified model. Empirical results show MFLM matches or exceeds binary detectors on several FL tasks and, when embeddings are combined with stylometric and n-gram features, yields improvements in three AA datasets, sometimes rivaling or surpassing SOTA baselines. The work demonstrates that FL usage encodes personalized author signals and encourages broader integration of figurative features into stylometric analysis for more robust authorship attribution.

Abstract

The identification of Figurative Language (FL) features in text is crucial for various Natural Language Processing (NLP) tasks, where understanding of the author's intended meaning and its nuances is key for successful communication. At the same time, the use of a specific blend of various FL forms most accurately reflects a writer's style, rather than the use of any single construct, such as just metaphors or irony. Thus, we postulate that FL features could play an important role in Authorship Attribution (AA) tasks. We believe that our is the first computational study of AA based on FL use. Accordingly, we propose a Multi-task Figurative Language Model (MFLM) that learns to detect multiple FL features in text at once. We demonstrate, through detailed evaluation across multiple test sets, that the our model tends to perform equally or outperform specialized binary models in FL detection. Subsequently, we evaluate the predictive capability of joint FL features towards the AA task on three datasets, observing improved AA performance through the integration of MFLM embeddings.
Paper Structure (19 sections, 2 figures, 11 tables)

This paper contains 19 sections, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Diagram illustrating our pipeline of training the individual binary FL models, augmenting the FL training collection with predicted labels and fine-tuning the MFLM.
  • Figure 2: Diagrammatic representation of our Authorship Attribution training and evaluation approach. Following this process, any baseline can take the place of the "MFLM" rectangle.