Table of Contents
Fetching ...

SPINE: SParse Interpretable Neural Embeddings

Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy

TL;DR

SPINE tackles the interpretability gap in dense word embeddings by transforming pre-trained vectors (e.g., GloVe, word2vec) into sparse, non-negative representations using a denoising $k$-sparse autoencoder. It introduces a novel loss combining Reconstruction Loss with Average Sparsity Loss and Partial Sparsity Loss, and employs cap-ReLU activations to enforce non-negativity and sparsity. Large-scale crowdsourced intrusion tests and diverse downstream tasks show SPINE yields highly interpretable dimensions and competitive or improved performance relative to GloVe, word2vec, and SPOWV. The work highlights the practical value of post-hoc interpretability in neural embeddings and suggests retrofitting sparse representations can enhance model transparency without requiring massive corpus re-training.

Abstract

Prediction without justification has limited utility. Much of the success of neural models can be attributed to their ability to learn rich, dense and expressive representations. While these representations capture the underlying complexity and latent trends in the data, they are far from being interpretable. We propose a novel variant of denoising k-sparse autoencoders that generates highly efficient and interpretable distributed word representations (word embeddings), beginning with existing word representations from state-of-the-art methods like GloVe and word2vec. Through large scale human evaluation, we report that our resulting word embedddings are much more interpretable than the original GloVe and word2vec embeddings. Moreover, our embeddings outperform existing popular word embeddings on a diverse suite of benchmark downstream tasks.

SPINE: SParse Interpretable Neural Embeddings

TL;DR

SPINE tackles the interpretability gap in dense word embeddings by transforming pre-trained vectors (e.g., GloVe, word2vec) into sparse, non-negative representations using a denoising -sparse autoencoder. It introduces a novel loss combining Reconstruction Loss with Average Sparsity Loss and Partial Sparsity Loss, and employs cap-ReLU activations to enforce non-negativity and sparsity. Large-scale crowdsourced intrusion tests and diverse downstream tasks show SPINE yields highly interpretable dimensions and competitive or improved performance relative to GloVe, word2vec, and SPOWV. The work highlights the practical value of post-hoc interpretability in neural embeddings and suggests retrofitting sparse representations can enhance model transparency without requiring massive corpus re-training.

Abstract

Prediction without justification has limited utility. Much of the success of neural models can be attributed to their ability to learn rich, dense and expressive representations. While these representations capture the underlying complexity and latent trends in the data, they are far from being interpretable. We propose a novel variant of denoising k-sparse autoencoders that generates highly efficient and interpretable distributed word representations (word embeddings), beginning with existing word representations from state-of-the-art methods like GloVe and word2vec. Through large scale human evaluation, we report that our resulting word embedddings are much more interpretable than the original GloVe and word2vec embeddings. Moreover, our embeddings outperform existing popular word embeddings on a diverse suite of benchmark downstream tasks.

Paper Structure

This paper contains 23 sections, 8 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Depiction of our $k$-sparse autoencoder for an input word 'internet'. Our variant of the $k$-sparse autoencoder attempts to reconstruct the input at its output layer, with only a few active hidden units (depicted in green). These active units correspond to an interpretable set of dimensions associated with the word 'internet'. The rest of the dimensions (depicted in orange) are inactive for this word.
  • Figure 2: A sample intrusion detection question. Here, 'visual' is the intruder word.