XAM: Interactive Explainability for Authorship Attribution Models
Milad Alshomary, Anisha Bhatnagar, Peter Zeng, Smaranda Muresan, Owen Rambow, Kathleen McKeown
TL;DR
This paper presents IXAM, an interactive explainability tool for embedding-based authorship attribution models that enables end-users to explore latent-space regions, identify shared authorial styles, and highlight text spans supporting model predictions. By combining 2D latent-space visualization (t-SNE), region-specific stylistic analysis via LLMs, and Gram2Vec-based features, IXAM supports post-hoc, user-driven explanations beyond static baselines. A user study indicates IXAM enhances perceived usefulness and user confidence compared to predefined explanations, though explanations remain post-hoc and may not fully capture model faithfulness. The work advances practical explainability for forensic and NLP-developer communities by providing an extensible, open-source interface compatible with HuggingFace models and LLM-assisted style analysis.
Abstract
We present IXAM, an Interactive eXplainability framework for Authorship Attribution Models. Given an authorship attribution (AA) task and an embedding-based AA model, our tool enables users to interactively explore the model's embedding space and construct an explanation of the model's prediction as a set of writing style features at different levels of granularity. Through a user evaluation, we demonstrate the value of our framework compared to predefined stylistic explanations.
