Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution
Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, Kathleen McKeown
TL;DR
The paper tackles the opacity of latent embeddings in state-of-the-art authorship attribution systems and proposes a bottom-up latent-space interpretation by clustering author embeddings into a small set of representative points that are mapped to distributions over writing-style features generated by LLMs. It demonstrates strong alignment with the original latent space (Pearson $r=0.79$), provides human-validated style descriptions (72% preference), and shows an average $+20\%$ accuracy improvement on the AA task when explanations are available. The approach combines clustering, large-language-model–derived style descriptors, and user studies to establish both the plausibility of the explanations and their practical utility for improving explainability and human performance in authorship attribution. Overall, it offers a scalable, interpretable framework for understanding and validating latent representations in AA models, with potential impact on forensic linguistics and real-world document authorship analysis.
Abstract
Recent state-of-the-art authorship attribution methods learn authorship representations of texts in a latent, non-interpretable space, hindering their usability in real-world applications. Our work proposes a novel approach to interpreting these learned embeddings by identifying representative points in the latent space and utilizing LLMs to generate informative natural language descriptions of the writing style of each point. We evaluate the alignment of our interpretable space with the latent one and find that it achieves the best prediction agreement compared to other baselines. Additionally, we conduct a human evaluation to assess the quality of these style descriptions, validating their utility as explanations for the latent space. Finally, we investigate whether human performance on the challenging AA task improves when aided by our system's explanations, finding an average improvement of around +20% in accuracy.
