Table of Contents
Fetching ...

XAM: Interactive Explainability for Authorship Attribution Models

Milad Alshomary, Anisha Bhatnagar, Peter Zeng, Smaranda Muresan, Owen Rambow, Kathleen McKeown

TL;DR

This paper presents IXAM, an interactive explainability tool for embedding-based authorship attribution models that enables end-users to explore latent-space regions, identify shared authorial styles, and highlight text spans supporting model predictions. By combining 2D latent-space visualization (t-SNE), region-specific stylistic analysis via LLMs, and Gram2Vec-based features, IXAM supports post-hoc, user-driven explanations beyond static baselines. A user study indicates IXAM enhances perceived usefulness and user confidence compared to predefined explanations, though explanations remain post-hoc and may not fully capture model faithfulness. The work advances practical explainability for forensic and NLP-developer communities by providing an extensible, open-source interface compatible with HuggingFace models and LLM-assisted style analysis.

Abstract

We present IXAM, an Interactive eXplainability framework for Authorship Attribution Models. Given an authorship attribution (AA) task and an embedding-based AA model, our tool enables users to interactively explore the model's embedding space and construct an explanation of the model's prediction as a set of writing style features at different levels of granularity. Through a user evaluation, we demonstrate the value of our framework compared to predefined stylistic explanations.

XAM: Interactive Explainability for Authorship Attribution Models

TL;DR

This paper presents IXAM, an interactive explainability tool for embedding-based authorship attribution models that enables end-users to explore latent-space regions, identify shared authorial styles, and highlight text spans supporting model predictions. By combining 2D latent-space visualization (t-SNE), region-specific stylistic analysis via LLMs, and Gram2Vec-based features, IXAM supports post-hoc, user-driven explanations beyond static baselines. A user study indicates IXAM enhances perceived usefulness and user confidence compared to predefined explanations, though explanations remain post-hoc and may not fully capture model faithfulness. The work advances practical explainability for forensic and NLP-developer communities by providing an extensible, open-source interface compatible with HuggingFace models and LLM-assisted style analysis.

Abstract

We present IXAM, an Interactive eXplainability framework for Authorship Attribution Models. Given an authorship attribution (AA) task and an embedding-based AA model, our tool enables users to interactively explore the model's embedding space and construct an explanation of the model's prediction as a set of writing style features at different levels of granularity. Through a user evaluation, we demonstrate the value of our framework compared to predefined stylistic explanations.

Paper Structure

This paper contains 18 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Our approach for building an interactive tool for exploring style explanations for AA models.
  • Figure 2: Our tool allows users to visualize the latent space of the AA model (top left), inspect a relevant subregion to see its common writing style features (top right), and show example spans from the text of the task authors for each of the features
  • Figure 3: An task example consisting of a mystery and three candidate authors. For presentation purpose, we show the highlighting of spans of the top three features of both LLM and Gram2Vec feature types. The top three LLM-derived style features are "Exclamations", "Expressions of uncertainty", and "Direct address". The top three Gram2Vec features are "Part-of-Speech Bigram:Punctuation followed by Proper noun", "Morphology Tag:Quotation punctuation type", and "Morphology Tag:Initial punctuation". It is clear that the Mystery Author's texts and Candidate 3's texts have the most spans highlighted of the different feature types.
  • Figure 4: Survey questions along with the scores distributions given by the three participants in the study.
  • Figure 5: Prompts that are used in our system
  • ...and 3 more figures