Table of Contents
Fetching ...

iBERT: Interpretable Style Embeddings via Sense Decomposition

Vishal Anand, Milad Alshomary, Kathleen McKeown

TL;DR

iBERT introduces a multi-sense encoder that represents each token as a sparse convex mixture over $k=8$ static sense vectors, yielding inherently interpretable and controllable embeddings that can be pooled into sentence representations or used at the token level. Trained with masked language modeling and direct-style supervision, it achieves strong results on STEL, SoC, and PAN while enabling targeted edits and probing of stylistic axes. The architecture extends Backpack-style sense representations to an encoder with global pooling, enabling axis-aligned interpretability, modular edits, and robust generalization across style- and content-related tasks. This work advances interpretable representation learning by providing a practical backbone for style-aware retrieval, debiasing, and controlled generation, with the promise of broader sociolinguistic relevance and safer, more transparent NLP systems.

Abstract

We present iBERT (interpretable-BERT), an encoder to produce inherently interpretable and controllable embeddings - designed to modularize and expose the discriminative cues present in language, such as stylistic and semantic structure. Each input token is represented as a sparse, non-negative mixture over k context-independent sense vectors, which can be pooled into sentence embeddings or used directly at the token level. This enables modular control over representation, before any decoding or downstream use. To demonstrate our model's interpretability, we evaluate it on a suite of style-focused tasks. On the STEL benchmark, it improves style representation effectiveness by ~8 points over SBERT-style baselines, while maintaining competitive performance on authorship verification. Because each embedding is a structured composition of interpretable senses, we highlight how specific style attributes - such as emoji use, formality, or misspelling can be assigned to specific sense vectors. While our experiments center on style, iBERT is not limited to stylistic modeling. Its structural modularity is designed to interpretably decompose whichever discriminative signals are present in the data - enabling generalization even when supervision blends stylistic and semantic factors.

iBERT: Interpretable Style Embeddings via Sense Decomposition

TL;DR

iBERT introduces a multi-sense encoder that represents each token as a sparse convex mixture over static sense vectors, yielding inherently interpretable and controllable embeddings that can be pooled into sentence representations or used at the token level. Trained with masked language modeling and direct-style supervision, it achieves strong results on STEL, SoC, and PAN while enabling targeted edits and probing of stylistic axes. The architecture extends Backpack-style sense representations to an encoder with global pooling, enabling axis-aligned interpretability, modular edits, and robust generalization across style- and content-related tasks. This work advances interpretable representation learning by providing a practical backbone for style-aware retrieval, debiasing, and controlled generation, with the promise of broader sociolinguistic relevance and safer, more transparent NLP systems.

Abstract

We present iBERT (interpretable-BERT), an encoder to produce inherently interpretable and controllable embeddings - designed to modularize and expose the discriminative cues present in language, such as stylistic and semantic structure. Each input token is represented as a sparse, non-negative mixture over k context-independent sense vectors, which can be pooled into sentence embeddings or used directly at the token level. This enables modular control over representation, before any decoding or downstream use. To demonstrate our model's interpretability, we evaluate it on a suite of style-focused tasks. On the STEL benchmark, it improves style representation effectiveness by ~8 points over SBERT-style baselines, while maintaining competitive performance on authorship verification. Because each embedding is a structured composition of interpretable senses, we highlight how specific style attributes - such as emoji use, formality, or misspelling can be assigned to specific sense vectors. While our experiments center on style, iBERT is not limited to stylistic modeling. Its structural modularity is designed to interpretably decompose whichever discriminative signals are present in the data - enabling generalization even when supervision blends stylistic and semantic factors.

Paper Structure

This paper contains 61 sections, 12 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: iBERT encodes tokens via k interpretable senses, producing editable and composable sense activations that are used either individually at the token level or pooled via configurable strategies into a global, interpretable embedding suitable for NLP pipelines.
  • Figure 2: Style-edit t-SNEs for iBERT-v3-10, ablating the most aligned sense $\ell$ for a given style. Red: original positive samples; Blue: edited ablated positive samples; Gray: negative samples. Arrows are from original-positive-centroid, to: edited-positive-centroid. Edits control semantics along: (a) visual form, (b) syntactic function, and (c) grammatical commitment. $\Delta$Dist: relative decrease in mean distance of positive samples to the negative centroid.
  • Figure 3: t-SNE projections of sentence embeddings for SynthSTEL. iBERT consistently separates contrastive style variants (e.g., misspelled sentences, emoji usage) better than Vanilla-SBERT, showing clearer margins and lower entanglement. Blue points represent positive samples and gray crosses represent negative samples. Despite iBERT-v2 being the most underperforming iBERT variant it still matches Vanilla's performance, and the separation of positive and negative styles is cleaner.
  • Figure 4: All Upper Case/Proper Capitalization ($\ell$=1) maintains separation on ablating other styles. Colors as in Fig. \ref{['fig:tsne-multi']}- Red: original, Blue: edited, Gray: negatives.
  • Figure 5: (a) shows the detailed technical iBERT architecture; (b–c) illustrate two pooling strategies used in sentence encoding (v1 and v2). The softmax-blend pooling variant (v3) lies between these and is described in Section \ref{['subsection:iBERT-sentence-embeddings']}. (d) details the sense construction block.
  • ...and 1 more figures