Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling
Gregorios A Katsios, Ning Sa, Tomek Strzalkowski
TL;DR
The paper investigates whether a single multi-task classifier that detects multiple figurative language forms can outperform separate binary detectors and whether these FL-derived features can improve authorship attribution. It builds MFLM on RoBERTa-Large by first training task-specific binary FL detectors across 13 datasets, then creating a multi-label FL training set for a unified model. Empirical results show MFLM matches or exceeds binary detectors on several FL tasks and, when embeddings are combined with stylometric and n-gram features, yields improvements in three AA datasets, sometimes rivaling or surpassing SOTA baselines. The work demonstrates that FL usage encodes personalized author signals and encourages broader integration of figurative features into stylometric analysis for more robust authorship attribution.
Abstract
The identification of Figurative Language (FL) features in text is crucial for various Natural Language Processing (NLP) tasks, where understanding of the author's intended meaning and its nuances is key for successful communication. At the same time, the use of a specific blend of various FL forms most accurately reflects a writer's style, rather than the use of any single construct, such as just metaphors or irony. Thus, we postulate that FL features could play an important role in Authorship Attribution (AA) tasks. We believe that our is the first computational study of AA based on FL use. Accordingly, we propose a Multi-task Figurative Language Model (MFLM) that learns to detect multiple FL features in text at once. We demonstrate, through detailed evaluation across multiple test sets, that the our model tends to perform equally or outperform specialized binary models in FL detection. Subsequently, we evaluate the predictive capability of joint FL features towards the AA task on three datasets, observing improved AA performance through the integration of MFLM embeddings.
