Table of Contents
Fetching ...

Explainable Graph Spectral Clustering For GloVe-like Text Embeddings

Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Piotr Borkowski, Dariusz Czerski, Eryk Laskowski

TL;DR

This work extends explainable graph spectral clustering from term-vector spaces to GloVe-like embeddings, addressing how to interpret cluster membership when documents are embedded via word vectors. It develops multiple embedding-based formulations (L-, N-, R-, K-, B- embeddings) and proves approximate equivalences among them, enabling explainable clustering results that tie back to document words. Through experiments on WikiGloVe, TweetGloVe, and TVS representations, the paper assesses clustering quality and explanation clarity across variants, highlighting contexts where GloVe aids or hinders performance and where explanations remain intuitive. The results provide practical guidance on when to prefer GloVe-based explainability versus traditional term-vector approaches, and lay groundwork for integrating non-linear embeddings in future work.

Abstract

In a previous paper, we proposed an introduction to the explainability of Graph Spectral Clustering results for textual documents, given that document similarity is computed as cosine similarity in term vector space. In this paper, we generalize this idea by considering other embeddings of documents, in particular, based on the GloVe embedding idea.

Explainable Graph Spectral Clustering For GloVe-like Text Embeddings

TL;DR

This work extends explainable graph spectral clustering from term-vector spaces to GloVe-like embeddings, addressing how to interpret cluster membership when documents are embedded via word vectors. It develops multiple embedding-based formulations (L-, N-, R-, K-, B- embeddings) and proves approximate equivalences among them, enabling explainable clustering results that tie back to document words. Through experiments on WikiGloVe, TweetGloVe, and TVS representations, the paper assesses clustering quality and explanation clarity across variants, highlighting contexts where GloVe aids or hinders performance and where explanations remain intuitive. The results provide practical guidance on when to prefer GloVe-based explainability versus traditional term-vector approaches, and lay groundwork for integrating non-linear embeddings in future work.

Abstract

In a previous paper, we proposed an introduction to the explainability of Graph Spectral Clustering results for textual documents, given that document similarity is computed as cosine similarity in term vector space. In this paper, we generalize this idea by considering other embeddings of documents, in particular, based on the GloVe embedding idea.

Paper Structure

This paper contains 18 sections, 63 equations, 11 figures, 19 tables.

Figures (11)

  • Figure 1: Left: Objects sorted by increasing average similarity among its top 5% similarities (black line) and by increasing average similarity among its lowest 5% similarities (green line). Right: Objects sorted by increasing difference between average similarity among its top 5% similarities and average similarity among its lowest 5% similarities. Dataset TWT.3. Embedding WikiGloVe.
  • Figure 2: Top: Largest and lowest similarities within intrinsic clusters. Left: Average similarities within and outside of intrinsic clusters. Right: Differences between average similarities within and outside of clusters. Dataset TWT.3. Embedding WikiGloVe.
  • Figure 3: A glance at the $L$-based clustering from three different perspectives (axes 1,2, or 2,3 or 1,3). Different colors reflect the clusters. Dataset TWT.3. Embedding WikiGloVe.
  • Figure 4: A glance at the $K$-based clustering from three different perspectives (axes 1,2, or 2,3 or 1,3). Different colors reflect the clusters. Dataset TWT.3. Embedding WikiGloVe.
  • Figure 5: A glance at the $N$-based clustering from three different perspectives (axes 1,2, or 2,3 or 1,3). Different colors reflect the clusters. Dataset TWT.3. Embedding WikiGloVe.
  • ...and 6 more figures