XAM: Interactive Explainability for Authorship Attribution Models

Milad Alshomary; Anisha Bhatnagar; Peter Zeng; Smaranda Muresan; Owen Rambow; Kathleen McKeown

XAM: Interactive Explainability for Authorship Attribution Models

Milad Alshomary, Anisha Bhatnagar, Peter Zeng, Smaranda Muresan, Owen Rambow, Kathleen McKeown

TL;DR

This paper presents IXAM, an interactive explainability tool for embedding-based authorship attribution models that enables end-users to explore latent-space regions, identify shared authorial styles, and highlight text spans supporting model predictions. By combining 2D latent-space visualization (t-SNE), region-specific stylistic analysis via LLMs, and Gram2Vec-based features, IXAM supports post-hoc, user-driven explanations beyond static baselines. A user study indicates IXAM enhances perceived usefulness and user confidence compared to predefined explanations, though explanations remain post-hoc and may not fully capture model faithfulness. The work advances practical explainability for forensic and NLP-developer communities by providing an extensible, open-source interface compatible with HuggingFace models and LLM-assisted style analysis.

Abstract

We present IXAM, an Interactive eXplainability framework for Authorship Attribution Models. Given an authorship attribution (AA) task and an embedding-based AA model, our tool enables users to interactively explore the model's embedding space and construct an explanation of the model's prediction as a set of writing style features at different levels of granularity. Through a user evaluation, we demonstrate the value of our framework compared to predefined stylistic explanations.

XAM: Interactive Explainability for Authorship Attribution Models

TL;DR

Abstract

XAM: Interactive Explainability for Authorship Attribution Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)